Applied Machine Learning
Machine Learning - as the name suggests - is considered to be the art
of "teaching" the computers to perform a specific task. These tasks
vary: in the beginning - the 50ies - the general opinion was
that the task is to infer a function that transforms values from a
multi-dimensional Boolean values to a single Boolean value.
The general viewpoint in those days was that all that a computer has to
do is to mimic the human thinking whereby from a set of logical
assessments one is required to draw a conclusion related to the
With the advance of the computer technology research specialised for
specific tasks. Early specialisations included robotics, and pattern
recognition and computer vision.
Machine learning today is regarded as the art of building models and
building algorithms for specific data, i.e. to exploit specific
domain-related knowledge in order to have faster and more capable
Rather than focusing on mathematical details of the presented
algorithms, I will try to present a general approach to solve problems
using the machine learning approach: what are the working principles and
canonical methods to approach model-driven data mining.
The exam note for this course is
Presentation given about a topic chosen in the first 6 weeks of the
semester (about 30 minutes long, maximum 40 slides).
Oral examination based on the topics of the lectures and the seminars
presented by the students.
Solution of the laboratory examples (presenting the solutions to the
practical problems is compulsory).
Randomness, random variables.
Generating random variables - correlated
ones and uncorrelated ones.
Generative models of data. Modelling digit maps.
Principal component analysis. Displaying the principal components of faces.
Independent components, analysis using ICA.
Kernel methods: Support Vector machines.
Gaussian process models.
Seminars should be about 30 minutes
long. They should be written in
English and contain enough definitions such that an unfamiliar reader
understands most of the notations
0. Sampling from a correlated Gaussian random variable.
The scope is to understand randomness and its use in data analysis
and generative models. Example MATLAB
Your task is to obtain samples from a THREE-dimensional Gaussian
that is constrained mostly to two directions, description is found
in the M-file.
1. Finding the eigen-images of centred USPS digits.
The USPS database (usps.mat) is a
collection of bitmaps representing handwritten digits. There is a
test and a training set available in the
Your task is to collect all digits "8"
from the database (both train and test) and to visualise the first three eigen-images of
the digit "8" subset. Once the eigen-decomposition is done, you can use vis.m to visualise the results. Collect the
results in a document.
2. Independent Component Analysis of Recordings
Independent Component Analysis (ICA) is a recent technique to
separate sources based on their statistical independence. Using ICA
one can separate sources "blindly" having only their
A good example of blind source separation (BSS) is the separation of
speakers in a room: assume that there are speakers -
we call them sources - , and
microphones - called mixtures:
. We know that the mixtures are a linear
combination of the sources, each mixture with a different ratio of
combining the sources, we represent the weight of source
in mixture with .
Your task is to write a program to separate (or
find) the sources given the mixtures in the matlab files
A template of the solution is in decompose.m -- includes samples of the useful matlab commands. You are advised to use the FastICA matlab package
(original URL: www.cis.hut.fi/projects/ica/fastica).
3. Independent components of natural images.
In 1996 an article appeared in Nature that presented the results of the ICA method on a collection of natural images: it claimed that the filtering matrices are similar to the receptive fields present in the brain.
The task is to reproduce the experiments of Olshausen and Field:
- Import the collection of images (if using Matlab, then use imread) and transform them into gray-scale images. Images are available in the images directory;
- Choose dim, the size of the squares (suggest to start with 10 ...);
- Extract patches and store them in a data matrix in column-format. The extraction might be overlapping, i.e. select the top (or bottom) left corner of a patch randomly from a randomly selected image.
- Apply the functions from the FastICA package (see above) to find the independent components;
- Visualise the results. Compare with the results presented in the book of Hyvarinen et al from the Literature.
4. Finding clusters in data
You find artificial three-dimensional data in the following file:
d_em.txt. The data has been generated with a number of
Using the NETLAB package,
test the Gaussian mixture model with different components and find the most appropriate one.
An excellent reference is Bishop (2006), chapter 9, pp. 423-439.
Visualise the results. You may start with the code in the matlab file gmm_solve.m.
5. Exploring the Boston Housing Data.
The Boston Housing data is locally available from this LINK. One should select models to fit
the data: from linear regression to quadratic, etc.
An example file - presented during the lectures - is how to
solve the linear system is LIN_SOL.M.
The task is to analyse the Boston housing data, the problem is
detailed in the MATLAB-file above:
- Construct a set of features from the Boston data, like the bias term - - or product term of different order - - and build the matrix of derived feature values;
- Associate a coefficient to each feature ;
- Find the optimal values of the coefficients;
- Compute the errors -- here you should consider the computation of the training and test errors;
- Visualise system performance - the test and training errors - against the number of parameters the linear system has.
6. Bayesian Analysis of regression
In the lectures we presented the analysis of a coin throw. We
established that we can use prior knowledge to
encode into our decision process. In the coin experiment we assumed
that we believed in the fairness of the coin - and
encoded this belief in a prior
distribution over the ratio of the heads/tails. In the
Bayesian (linear) regression we believe that there is a (linear)
relation between the observed input/output pairs, we encode them in
a Gaussian prior distribution of the parameters of
Task is to devise an algorithm that updates our beliefs about the
hyperplane parameters. A template available in the M-file:
7. Example of a kernel algorithm
The popular support vector classification algorithm belongs to the
family of kernel algorithm. These algorithms are linear - but in a
space that is different from the space of the inputs. Therefore,
one needs to project the inputs from the data-set to the
space of features -- as done in analysing the Boston
housing data-set. The projection is than replaced with the
kernel function and the solution to the classification
algorithm is written with respect to the kernel function.
Use a kernel method to build a classification system for the FACES data-base (freely available from: http://cbcl.mit.edu/cbcl/software-datasets/FaceData2.html).
You should load the training data-set and train a decision system to classify the previously not seen test data-set.
Slides used during lectures:
Michael A. Arbib (ed.):
The Handbook of Brain Theory and Neural Networks, MIT Press (2002).
(link sent in mail)
Pierre Baldi and Soeren Brunak:
the Machine Learning Approach, MIT Press (1998).
This book contains useful references to an interesting
application of machine learning methods.
(link sent in mail)
Christopher M. Bishop:
Pattern Recognition and Machine Learning, Springer-Verlag (2006).
(link sent in mail)
Dana H. Ballard, Christopher M. Brown:
Computer Vision, Prentice-Hall (1982).
(download from homepage)
Thomas M. Cover, Joy A. Thomas:
Information Theory, Wiley and Sons (2006).
A good book on topics related to information theory. (link sent in mail)
Trevor Hastie, Jerome Friedman, Robert Tibshirani:
Elements of Statistical Learning: Data mining, Inference, and
Prediction, Springer-Verlag (2009).
Another "classic textbook", excellent explanations.
link: book download link
Aapo Hyvärinen, Jarmo Hurri, and Patrik O. Hoyer:
Natural Image Statistics:
A Probabilistic Approach to Early Computational Vision, Springer-Verlag (2009).
A nice presentation of the computational vision and the ICA. (preprint version available)
Thomas Mitchell: Machine Learning, McGraw-Hill, (1997).
The classic textbook in Machine Learning.
(link sent in mail)
Andrew Webb: Statistical
Pattern Recognition. Wiley and Sons (2002).
(link sent in mail)
Address: Lehel _dot_ Csato _at_ cs _dot_ ubbcluj _dot_ ro