Demo program

Quadratic regression is the simplest case of GP inference. By changing the likelihood function, one can implement robust inference - where the likelihood is eg. Laplace, corresponding to 1-norm - or binary classification, for details see the demos provided by the OGP package:

demogp_reg_gui - a regression demo which has a Graphical User Interface (GUI) where one can change the noise types, invoking different likelihoods.
demogp_class - classification demo without a GUI.
demogp_class_gui - classification demo with a GUI. Here one can define the training data himself. It is useful for testing classification performance using different types of kernel and the performance of the hyper-parameter optimisation routine.

The code below is a simplified version of the file demogp_reg which is part of the OGP software package. The explained program performs inference using Gaussian likelihood, ie. performing quadratic regression.

An artificial data set is used for inference: the noisy sinc function approximated on [-10,10]. First we define the global structures holding the GP and the various constants used for data generation and inference:

global net gpopt ep

nType  = 'GAUSS';    % type of the noise
sig02  = 3;          % variance
allBV  = 10:10:60;   % BV set sizes
totExp = 1;          % experiment # for EACH BV set size
nTest  = 201;        % test data size
nTrain = 25;         % training data size

Generating the data

[xTest,yTest] = sincdata(nType,nTest,0,[],1);

Repeating for each BV set size

indBV = 0;
for maxB = allBV;
  indBV = indBV + 1;
  for bInd= 1:totExp;	     % iteration for different datasets
    % training data
    [xTrain, yTrain] = sincdata(nType,nTrain,sig02,[],0);

First, the OGP structure which holds the inferred GP is initialised. The first instruction creates the OGP structure with an RBF or squared exponential kernel, the second line sets the likelihood for the problem.

    ogp(1,1, ...             % input/output dimension
        'sqexp', ...         % kernel type
        log([1/5 100 1]));   % kernel parameters
    ogpinit(@c_reg_gauss,...
            0.1, ...         % noise variance
            @em_gauss);      % to adjust the likelihood variance

The scaling of the inputs is the first kernel parameter, in this case is 5. The second parameter is the amplitude of the Gaussian process (100) and the last one would be the order, it is not used in the case of the squared-exponential kernel.

net.thresh is a small value which is the BV addition threshold; this controls the "degree of singularity" allowed in the OGP structure, essentially the level at which two Basis Vectors are considered identical.

    net.thresh = 1e-5;       % the admission threshold for new BVs
    net.maxBV  = maxB;       % the maximal size of BV set
    ogphypcovpar(1e-2);

The learning depends on several parameters which are contained in a structure gpopt. Its fields have to be initialised prior the learning stage. A set of default options are put inot gpopt by using the function defoptions.

The function ogphypcovpar initialises the hyperpriors to the Gaussian process hyperparameters -- used to make the algorithm more stable.

The indicator isep says that the iterative extension to the online algorithm will be used, itn specifies the number of sweeps through the data, isem is an other indicator saying that an EM algorithm is used to infer likelihood parameter, which is the noise variance. Finally the function address specifies the function that is used for hyper-parameter update (em_gauss).

    gpopt = defoptions;       % setting defaults
    gpopt.postopt.isep  = 1;  % USING the TAP/EP algorithm
    gpopt.postopt.itn   = 2;  % number of EP iteration with CHANGING BVs
    gpopt.postopt.fixitn= 1;  % FIXING the BV set.

The learning procedure itself iterates the search for optimal GP with fixed hyper-parameters (E-step) and the search for optimal hyper-parameters (M-step), followed by a resetting of the OGP structure (this is mainly for stability of the inference: the OGP parameters were estimated with different hyper-parameters, thus changing these invalidates the previously inferred parameters).

    maxHyp = 4;
    for iHyp = 1:maxHyp;
      ogptrain(xTrain,yTrain);
      ogpreset;
      ogppost(xTrain,yTrain)
    end;

Notice the difference between ogptrain and ogppost. The first function is used to train the Gaussian process, i.e. to obtain optimal set of hyperparameters and the last one obtains a posterior with a fixed set of hyperparameters.

The rest of the code is the graphical visualisation of the posterior GP. First, the posterior mean and variances at test point locations are obtained using ogpfwd then plotted together with the true function and the locations of the Basis Vectors.

    [meanT, varT] = ogpfwd(xTest);
    stdT          = sqrt(varT);
    meanBV        = ogpfwd(net.BV);
    figure; clf; hold on; box on;
    plot(xTest,yTest);
    plot(xTest,meanT,'k--');
    plot(xTest,meanT + stdT,'r-.');
    plot(xTest,meanT - stdT,'r-.');
    plot(net.BV,meanBV,'k*');
    plot(xTrain,yTrain,'r.');

Finally the for loops are closed.

  end;
end;

Observe that in plotting the posterior process we used the function ogpfwd(net,inputs) - which returns the mean and variance of the posterior process at the input locations.
A different way of visualising the result si to sample from it. This can be done using the function ogpsample(net,inputs) and gives a random sample from the Gaussian posterior (see image in the Glossary about the posterior samples).

Questions, comments, suggestions: contact Lehel Csató.