Applied Machine Learning
Syllabus
Page contents:
Machine Learning  as the name suggests  is considered to be the art
of "teaching" the computers to perform a specific task. These tasks
vary: in the beginning  the 50ies  the general opinion was
that the task is to infer a function that transforms values from a
multidimensional Boolean values to a single Boolean value.
The general viewpoint in those days was that all that a computer has to
do is to mimic the human thinking whereby from a set of logical
assessments one is required to draw a conclusion related to the
observation.
With the advance of the computer technology research specialised for
specific tasks. Early specialisations included robotics, and pattern
recognition and computer vision.
Machine learning today is regarded as the art of building models and
building algorithms for specific data, i.e. to exploit specific
domainrelated knowledge in order to have faster and more capable
algorithms.
Rather than focusing on mathematical details of the presented
algorithms, I will try to present a general approach to solve problems
using the machine learning approach: what are the working principles and
canonical methods to approach modeldriven data mining.
The exam note for this course is
 40%

Presentation given about a topic chosen in the first 6 weeks of the
semester (about 30 minutes long, maximum 40 slides).
 20%

Oral examination based on the topics of the lectures and the seminars
presented by the students.
 40%

Solution of the laboratory examples (presenting the solutions to the
practical problems is compulsory).

Randomness, random variables.

Generating random variables  correlated
ones and uncorrelated ones.

Generative models of data. Modelling digit maps.

Principal component analysis. Displaying the principal components of faces.

Independent components, analysis using ICA.

Bayesian methods.

Kernel methods: Support Vector machines.

Bayesian methods.

Gaussian process models.
Seminars should be about
30 minutes long. They should be written in
English and contain enough definitions such that an unfamiliar reader
understands most of the notations.

0. Sampling from a correlated Gaussian random variable.

The scope is to understand randomness and its use in data analysis
and generative models. Example MATLAB
file: gauss.m.
Your task is to obtain samples from a THREEdimensional Gaussian
that is constrained mostly to two directions, description is found
in the Mfile.

1. Finding the eigenimages of centred USPS digits.

The USPS database (usps.mat) is a
collection of bitmaps representing handwritten digits. There is a
test and a training set available in the
MATLABfile.
Your task is to collect all digits "8"
from the database (both train and test) and to visualise the first three eigenimages of
the digit "8" subset. Once the eigendecomposition is done, you can use vis.m to visualise the results. Collect the
results in a document.

2. Independent Component Analysis of Recordings

Independent Component Analysis (ICA) is a recent technique to
separate sources based on their statistical independence. Using ICA
one can separate sources "blindly" having only their
mixtures available.
A good example of blind source separation (BSS) is the separation of
speakers in a room: assume that there are $k$ speakers 
we call them sources  $s\_1,...,s\_k$, and
$k$ microphones  called mixtures:
$x\_1,...,x\_k$. We know that the mixtures are a linear
combination of the sources, each mixture with a different ratio of
combining the sources, we represent the weight of source
$s\_j$ in mixture $x\_i$ with $A\_ij$.
Your task is to write a program to separate (or
find) the sources given the mixtures in the matlab files
A template of the solution is in decompose.m  includes samples of the useful matlab commands. You are advised to use the FastICA matlab package
(original URL: www.cis.hut.fi/projects/ica/fastica).

3. Independent components of natural images.

In 1996 an article appeared in Nature that presented the results of the ICA method on a collection of natural images: it claimed that the filtering matrices are similar to the receptive fields present in the brain.
The task is to reproduce the experiments of Olshausen and Field:
 Import the collection of images (if using Matlab, then use imread) and transform them into grayscale images. Images are available in the images directory;
 Choose dim, the size of the squares (suggest to start with 10 ...);
 Extract patches and store them in a data matrix in columnformat. The extraction might be overlapping, i.e. select the top (or bottom) left corner of a patch randomly from a randomly selected image.
 Apply the functions from the FastICA package (see above) to find the independent components;
 Visualise the results. Compare with the results presented in the book of Hyvarinen et al from the Literature.

4. Finding clusters in data

You find artificial threedimensional data in the following file:
d_em.txt. The data has been generated with a number of
clusters.
Using the NETLAB package,
test the Gaussian mixture model with different components and find the most appropriate one.
An excellent reference is Bishop (2006), chapter 9, pp. 423439.
Visualise the results. You may start with the code in the matlab file gmm_solve.m.

5. Exploring the Boston Housing Data.

The Boston Housing data is locally available from this LINK. One should select models to fit
the data: from linear regression to quadratic, etc.
An example file  presented during the lectures  is how to
solve the linear system $y=\; w*x\; +w\_0$ is LIN_SOL.M.
The task is to analyse the Boston housing data, the problem is
detailed in the MATLABfile above:
 Construct a set of features from the Boston data, like the bias term  $x^0$  or product term of different order  $x\_i1*...*x\_ik$  and build the matrix of derived feature values;
 Associate a coefficient to each feature ;
 Find the optimal values of the coefficients;
 Compute the errors  here you should consider the computation of the training and test errors;
 Visualise system performance  the test and training errors  against the number of parameters the linear system has.

6. Bayesian Analysis of regression

In the lectures we presented the analysis of a coin throw. We
established that we can use prior knowledge to
encode into our decision process. In the coin experiment we assumed
that we believed in the fairness of the coin  and
encoded this belief in a prior
distribution over the ratio of the heads/tails. In the
Bayesian (linear) regression we believe that there is a (linear)
relation between the observed input/output pairs, we encode them in
a Gaussian prior distribution of the parameters of
the hyperplane.
Task is to devise an algorithm that updates our beliefs about the
hyperplane parameters. A template available in the Mfile:
(bayes_reg.m)

7. Example of a kernel algorithm

The popular support vector classification algorithm belongs to the
family of kernel algorithm. These algorithms are linear  but in a
space that is different from the space of the inputs. Therefore,
one needs to project the inputs from the dataset to the
space of features  as done in analysing the Boston
housing dataset. The projection is than replaced with the
kernel function and the solution to the classification
algorithm is written with respect to the kernel function.
Use a kernel method to build a classification system for the FACES database (freely available from: http://cbcl.mit.edu/cbcl/softwaredatasets/FaceData2.html).
You should load the training dataset and train a decision system to classify the previously not seen test dataset.
Slides used during lectures:

Michael A. Arbib (ed.):
The Handbook of Brain Theory and Neural Networks, MIT Press (2002).
(link sent in mail)

Pierre Baldi and Soeren Brunak:
Bioinformatics:
the Machine Learning Approach, MIT Press (1998).
This book contains useful references to an interesting
application of machine learning methods.
(link sent in mail)

Christopher M. Bishop:
Pattern Recognition and Machine Learning, SpringerVerlag (2006).
(link sent in mail)

Dana H. Ballard, Christopher M. Brown:
Computer Vision, PrenticeHall (1982).
(download from homepage)

Thomas M. Cover, Joy A. Thomas:
Elements of
Information Theory, Wiley and Sons (2006).
A good book on topics related to information theory. (link sent in mail)

Trevor Hastie, Jerome Friedman, Robert Tibshirani:
The
Elements of Statistical Learning: Data mining, Inference, and
Prediction, SpringerVerlag (2009).
Another "classic textbook", excellent explanations.
link: book download link

Aapo Hyvärinen, Jarmo Hurri, and Patrik O. Hoyer:
Natural Image Statistics:
A Probabilistic Approach to Early Computational Vision, SpringerVerlag (2009).
A nice presentation of the computational vision and the ICA. (preprint version available)

Thomas Mitchell: Machine Learning, McGrawHill, (1997).
The classic textbook in Machine Learning.
(link sent in mail)

Andrew Webb: Statistical
Pattern Recognition. Wiley and Sons (2002).
(link sent in mail)
Links
Address: Lehel _dot_ Csato _at_ cs _dot_ ubbcluj _dot_ ro