Workshop

Kernel Based Automatic Learning

Courses

Fall 2015

Probability and Statistics

Spring 2015

 

Math 6397 Automatic Learning and Data Mining (Section# 21893)
Time: TuTh 10:00AM - 11:30AM - Room: MH 138
Instructor: R. Azencott
Prerequisites: previous familiarity (at the undergraduate level) with  random variables, probability distributions, basic statistics
Text(s):

Text Book: None

Reference Books : Reading assignments will be a  small set of specific chapters extracted by from the following  reference texts, as well as a few published articles.

- The Elements of Statistical Learning, Data Mining :   Freedman, Hastie, Tibshirani
- Kernel Methods in Computational Biology  :   B. Schölkopf, K. Tsuda, J.-P. Vert
- Introduction to Support Vector Machines:    N. Cristianini ,  J. Shawe-Taylor

Description:

Automatic Learning of unknown functional relationships Y = F(X) between an output  Y and high-dimensional inputs X , involves algorithms dedicated  to the intensive analysis of large "training sets"  of N "examples" of inputs/outputs pairs (Xn,Yn ), with n= 1…N to discover efficient "blackboxes" approximating the unknown function X->F(X). Automatic learning was first applied  to emulate intelligent tasks involving complex patterns identification,   in artificial vision, face recognition, sounds identification, speech understanding, handwriting recognition, texts classification and retrieval, etc. Automatic learning has now been widely extended to the analysis of high dimensional biological data sets  in  proteomics and  genes interactions networks, as well as to smart mining of massive data sets gathered on the Internet.  

The course will study  major  machine learning algorithms derived from Positive Definite Kernels  and their associated Self-Reproducing Hilbert spaces. We will study the implementation, performances, and drawbacks of  Support Vector Machines classifiers, Kernel based Non Linear Clustering, Kernel based Non Linear Regression, Kernel PCA. We will explore connections between kernel based learning  and Dictionary Learning as well as Artificial Neural Nets with emphasis on  key conceptual features  such as generalisation capacity. We will present classes of Positive Definite Kernels designed to handle the  long "string descriptions" of  proteins involved in genomics and proteomics.

The course  will focus  on understanding key concepts through  their mathematical formalization,  as well as  on computerized algorithmic implementation and intensive testing on  actual data sets

 

Spring 2014

Math 6397 Automatic Learning and Data Mining

 

Spring 2009

MATH 6397 Random Cellular Automata (Section# 27119 )

Spring 2008

 

MATH 6397: Gibbs-Markov fields, image analysis and Simulated Anealing
(section# 34194 )
Time: TuTh 2:30AM - 4:00PM - Room: AH 15
Instructor: Azencott
Prerequisites: "Introduction to probability" math 6382 , or instructor's consent ;
Text(s):

Reference books : selected chapters in

  • Pierre Bremaud : Markov Chains, Gibbs Fields, Monte Carlo Simulation, Springer TAM Vol 31
  • Bernard Chalmond : Modeling and inverse problems in image analysis , Springer Vol 155, 2003
Description:

Gibbs-Markov fields are probabilistic models designed to analyze large systems of "particles" or "processors" in interaction and were introduced by physicists to modelize networks of small magnets.

We give a self contained presentation for the basics of Gibbs- Markov fields theory and develop several applications :

  • Texture identification and segmentation for image analysis
  • Simulated annealing for stochastic minimization of "hard" combinatorics problems

The main probabilistic topics in this course involve the dynamics of discrete time Markov chains on (huge) finite spaces, Gaussian random vectors, Monte-Carlo simulations.

Students familiar with Mathlab or equivalent scientific softwares will have the possibility to replace a large proportion of homework assignments and exams by the realization of applied projects

 

Fall 2007

 

Math 6397: Data Mining and Auto.Learning (Section# 10591 )
Time: TuTh 1:00PM - 2:30PM- 350-PGH
Instructor: Robert Azencott
Prerequisites: Basic notions of probability theory will be redefined in the course; previous familiarity with random vectors and standard probability distributions at the undergraduate level will be assumed; basic definitions concerning Hilbert spaces and the Fourier transform, will be assumed to be known.
Text(s): References : selected chapters in AN INTRODUCTION TO SUPPORT VECTOR MACHINES N. Cristianini and J. Shawe-Taylor Cambridge University Press 2000 ISBN: 0 521 78019 5
Description:

Automatic Learning of unknown functional relationships Y = f(X) between multidimensional inputs X and outputs Y, involves algorithms dedicated to the intensive analysis of large finite "training sets" of "examples" of inputs/outputs pairs (X,Y). In numerous applications such as artificial vision, shape recognition, sound identification, handwriting recognition, text classification, Automatic Learning has become a major tool to emulate many perceptive tasks. Hundreds of machine learning algorithms have been emerged in the last 10 years, as well as powerful mathematical concepts, focused on key learning features : generalisation, accuracy, speed, robustness. The mathematics involved rely on information and probability theory, non parametric statistics, complexity, functional approximation. The course will present major learning architectures : Support Vector Machines and Artificial Neural Networks, as well as Clustering techniques such as the Kohonen networks . Concrete applications will be presented.

Homework and exams : Students familiar with Mathlab or equivalent scientific softwares will have the possibility to replace a large part of the homework assignments and of the exams by applied projects involving computer simulations and implementation of algorithms taught in the course