Bibliography

1
David Barber and Cristopher M. Bishop.
Ensemble learning in Bayesian neural networks.
In Christopher M. Bishop, editor, Neural Networks and Machine Learning, Proceedings of the NATO Advanced Study Institute on Generalization in Neural Networks and Machine Learning, Berlin, 1998. Springer-Verlag.

2
L. Mark Berliner, L.M. Wikle, and Noel Cressie.
Long-lead prediction of Pacific SST via Bayesian dynamic modelling.
Journal of Climate, 13:3953-3968, 2000.

3
José M. Bernardo and Adrian F.M. Smith.
Bayesian Theory.
John Wiley & Sons, 1994.

4
Christopher M. Bishop.
Neural Networks for Pattern Recognition.
Oxford University Press, New York, N.Y., 1995.

5
C.J.N. Blight and C. Ott.
A bayesian approach to model inadequacy for polynomial regression.
Biometrica, 62:79-88, 1975.

6
Léon Bottou.
Online learning and stochastic approximations.
In On-Line Learning in Neural Networks [68], pages 9-42.

7
D. S. Broomhead and D. Lowe.
Multivariable functional interpolation and adaptive networks.
Complex Systems, 2:321-355, 1988.

8
Christopher J.C. Burges.
A tutorial on support vector machines for pattern recognition.
Knowledge Discovery and Data Mining, 2(2):121-167, 1998.

9
Colin Campbell, Nello Cristianini, and Alexander J. Smola.
Query learning with large margin classifiers.
In Proceedings of the 17th International Conference on Machine Learing, pages 111-118, 2000.

10
Gert Cauwenberghs and Tomaso Poggio.
Incremental and decremental Support Vector Machine learning.
In Todd K. Leen, Thomas G. Diettrich, and Volker Tresp, editors, NIPS, volume 13. The MIT Press, 2001.

11
S. Chen, , D. Donoho, and M. Saunders.
Atomic decomposition by basis pursuit.
Technical Report 479, Department of Statistics, Stanford University, 1995.

12
Saobing S. Chen.
Basis Pursuit.
PhD thesis, Department of Statistics, Stanford University, November 1995.

13
D Cornford, I T Nabney, and C K I Williams.
Modelling frontal discontinuities in wind fields.
Nonparametric Statistics, 2002.
accepted.

14
Thomas M. Cover and Joy A. Thomas.
Elements of Information Theory.
John Wiley & Sons, 1991.

15
Noel A.C. Cressie.
Statistics for Spatial Data.
John Wiley and Sons, New York, 1993.

16
Lehel Csató, Dan Cornford, and Manfred Opper.
Online learning of wind-field models.
In International Conference on Artificial Neural Networks, pages 300-307, 2001.

17
Lehel Csató, Ernest Fokoué, Manfred Opper, Bernhard Schottky, and Ole Winther.
Efficient approaches to Gaussian process classification.
In NIPS, volume 12, pages 251-257. The MIT Press, 2000.

18
Lehel Csató and Manfred Opper.
Sparse representation for Gaussian process models.
In Todd K. Leen, Thomas G. Dietterich, and Volker Tresp, editors, NIPS, volume 13, pages 444-450. The MIT Press, 2001.

19
Lehel Csató and Manfred Opper.
Sparse on-line Gaussian Processes.
Neural Computation, 14(3):641-669, 2002.

20
J.F.G. de Freitas, Mahesan Nianjan, and A.H. Gee.
Hierarchical bayesian kalman models for regularisation and ARD in sequential learning.
Technical report, Cambridge University, Engineering Department, http://svr-www.eng.cam.ac.uk/reports/people/niranjan.html, 1998.

21
Luc Devroye, László Györfi, and Gábor Lugosi.
A Probabilistic Theory of Pattern Recognition.
Number 31 in Applications of mathematics. Springer, New York, 1996.

22
D J Evans, D Cornford, and I T Nabney.
Structured neural network modelling of multi-valued functions for wind retrieval from scatterometer measurements.
Neurocomputing Letters, 30:23-30, 2000.

23
J. H. Friedman.
Multivariate adaptive regression splines.
Annals of Statistics, 19:1-141, 1991.

24
A. Gelb, J.F. Kasper, R.A. Nash, C.F. Price, and A.A. Sutherland.
Applied Optimal Estimation.
The MIT press, Cambridge, MA, 1974.

25
Andrew Gelman, John B. Carlin, and Hal S. Stern.
Bayesian Data Analysis.
Chapman & Hall, 1995.

26
Mark Gibbs and David J.C. MacKay.
Efficient implementation of Gaussian processes.
Technical report, Department of Physics, Cavendish Laboratory, Cambridge University, 1997.
http://wol.ra.phy.cam.ac.uk/mackay.

27
Mark Gibbs and David J.C. MacKay.
Variational Gaussian process classifiers.
Technical report, Department of Physics, Cavendish Laboratory, Cambridge University, 1999.
http://wol.ra.phy.cam.ac.uk/mackay/abstracts/gpros.html.

28
Mark Girolami.
Orthogonal series density estimation and the kernel eigenvalue problem.
Neural Computation, 14(3):669-688, 2002.

29
R. P. Gorman and T. J. Sejnowski.
Analysis of the hidden units in layered networks trained to classify sonar targets.
Neural Networks, 1:75-89, 1988.

30
I.S. Gradshteyn and I.M. Ryzhik.
Table of integrals, series, and products.
Academic Press, New York, 1994.

31
Simon Haykin.
Neural networks : a comprehensive foundation.
Macmillan, New York, 1994.

32
Ralf Herbrich, Thore Graepel, and Colin Campbell.
Bayes point machines.
Journal of Machine Learning Research, 1:245-279, 2001.

33
Geiffrey E. Hinton and Drew van Camp.
Keeping neural networks simple by minimising the description length.
In 6th Conference on Computational Learning Theory, pages 5-13, 1993.

34
Timothy E. Holy.
The analysis of data from continuous probability distributions.
http://xxx.arXiv.org/ps/physics/9706015, pages 1-8, 1997.

35
Peter J. Huber.
Projection pursuit.
The Annals of Statistics, 13(2):435-475, 1985.

36
Tommi Jaakkola and David Haussler.
Probabilistic kernel regression.
In Online Proceedings of 7-th Int. Workshop on AI and Statistics. http://uncertainty99.microsoft.com/proceedings.htm, 1999.

37
I. T. Jolliffe.
Principal Component Analysis.
Springer Verlag, New York, 1986.

38
R.E. Kalman.
A new approach to linear filtering and prediction problems.
Trans ASME, Journal of Basic Engineering, Ser. D, 83:35-45, 1960.

39
R.E. Kalman and R.S. Bucy.
New results in linear filtering and prediction theory.
Trans ASME, Journal of Basic Engineering, Ser. D, 83:95-108, 1961.

40
G.S. Kimeldorf and G. Wahba.
Some results on Tchebycheffian spline functions.
J. Math. Anal. Applic., 33:82-95, 1971.

41
S. Kullback.
Information Theory and Statistics.
Wiley, New York, 1959.

42
David J.C. MacKay.
Comparison of approximate methods for handling hyperparameters.
Neural Computation, 11:1035-1068, 1999.

43
K. V. Mardia, J. T. Kent, and J. M. Bibby.
Multivariate Analysis.
Academic Press, London, 1979.

44
P. McCullagh and J. A. Nelder.
Generalized Linear Models.
Chapman & Hall, London, 1989.

45
J. Mercer.
Functions of positive and negative type and their connection with the theory of integral equations.
Philos. Trans. Roy. Soc. London, A 209:415-446, 1909.

46
Thomas P. Minka.
Expectation Propagation for Approximate Bayesian Inference.
PhD thesis, Dep. of El. Eng. & Comp. Sci.; MIT, vismod.www.media.mit.edu/$ \sim$tpminka, 2000.

47
I T Nabney, D Cornford, and C K I Williams.
Bayesian inference for wind field retrieval.
Neurocomputing Letters, 30:3-11, 2000.

48
Ian T. Nabney, Dan Cornford, and Christopher K.I. Williams.
Structured neural network modelling for multi-valued functions for wind vector retrieval from satellite scatterometer measurements.
Neurocomputing, 30:23-30, 2000.

49
Radford M. Neal.
Regression and classification using Gaussian process priors (with discussion).
In J. M. Bernerdo, J. O. Berger, A. P. Dawid, and A. F. M. Smith, editors, Bayesian Statistics, volume 6, pages 475-501. Oxford University Press, 1997.
ftp://ftp.cs.utoronto.ca/pub/radford/mc-gp.ps.Z.

50
Ilya Nemenmann and William Bialek.
Learning continuous distributions: Simulations with field theoretic priors.
In Todd K. Leen, Thomas G. Diettrich, and Volker Tresp, editors, NIPS, pages 287-293, 2001.

51
D Offiler.
The calibration of ERS-1 satellite scatterometer winds.
Journal of Atmospheric and Oceanic Technology, 11:1002-1017, 1994.

52
Anthony O'Hagan.
Curve fitting and optimal design for prediction (with discussion).
J. Royal Statistical Socierty, Ser. B, 40:1-42, 1978.

53
Manferd Opper and Ole Winther.
Tractable approximations for probabilistic models: the adaptive Thouless-Anderson-Palmer mean field approach.
Physical Review Letters, 86(17):3695-3698, 2001.

54
Manfred Opper.
Online versus offline learning from random examples: General results.
Phys. Rev. Lett., 77(22):4671-4674, 1996.

55
Manfred Opper.
A Bayesian approach to online learning.
In On-Line Learning in Neural Networks [68], pages 363-378.

56
Manfred Opper and David Saad, editors.
Advanced Mean Field Methods: Theory and Practice.
The MIT Press, 2001.

57
Manfred Opper and Ole Winther.
Gaussian processes and SVM: Mean field results and leave-one-out estimator.
In Smola et al. [77], pages 43-65.

58
Manfred Opper and Ole Winther.
Gaussian processes for classification: Mean-field algorithms.
Neural Computation, 12:2655-2684, 2000.

59
Edgar E. Osuna and Federico Girosi.
Reducing the run-time complexity in Support Vector Machines.
In Schölkopf et al. [71], pages 271-284.

60
Y.C. Pati, R. Rezaiifar, and P.S. Krishnaprasad.
Orthogonal matching pursuit: Recursive function approximations to wavelet decomposition.
In 27-th Annual Asilomar Conference on Signals Systems and Computers, citeseer.nj.nec.com/pati93orthogonal.html, November 1993.

61
John C. Platt.
Fast training of Support Vector Machines using sequential minimal optimisation.
In Schölkopf et al. [71], pages 185-208.

62
John C. Platt.
Probabilities for Support Vector Machines.
In Smola et al. [77], pages 61-74.

63
Tomaso Poggio and Federico Girosi.
Networks for approximation and learning.
Proceedings of The IEEE, 78(9):1481-1497, 1990.

64
W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling.
Numerical Recipes in C.
Cambridge University Press, Cambridge, second edition, 1992.

65
Carl Edward Rasmussen and Zoubin Ghahramani.
Infinite mixture of gaussian process experts.
In Thomas G. Dietterich, Suzanna Becker, and Zoubin Ghahramani, editors, NIPS, volume 14, http://nips.cc, 2002. The MIT Press.

66
B.D. Ripley.
Pattern Recogntion and Neural Networks.
Cambridge University Press, Cambridge, UK, 1996.

67
Sam Roweis.
Em algorithms for pca and spca.
In Michael I. Jordan, Michael J. Kearns, and Sara A. Solla, editors, NIPS, volume 10. The MIT Press, 1998.

68
David Saad.
On-Line Learning in Neural Networks.
Cambridge Univ. Press, 1998.

69
David M. Schmidt.
Continuous probability distributions from finite data.
http://xxx.arXiv.org/ps/physics/9808005, pages 1-8, 1998.

70
B. Schölkopf, , Alexander Smola, and Klaus-Robert Müller.
Nonlinear component analysis as a kernel eigenvalue problem.
Neural Computation, 10:1299-1319, 1998.

71
Bernhard Schölkopf, Christopher J.C. Burges, and Alexander J. Smola, editors.
Advances in kernel methods (Support Vector Learning). The MIT Press, 1999.

72
Bernhard Schölkopf, Ralf Herbrich, and Alexander J. Smola.
A generalized representer theorem.
In Fourteenth Annual Conference on Computational Learning Theory, page in press, 2001.

73
Bernhard Schölkopf, Sebastian Mika, Chris J.C. Burges, Philipp Knirsch, Klaus-Robert Müller, Gunnar Rätsch, and Alex J. Smola.
Input space vs. feature space in kernel-based methods.
IEEE Transactions on Neural Networks, 10(5):1000-1017, 1999.

74
Matthias Seeger.
Bayesian model selection for Support Vector Machines, Gaussian processes and other kernel classifiers.
In Sara A. Solla, Todd K. Leen, and Klaus-Robert Müller, editors, NIPS, volume 12. The MIT Press, 2000.

75
H. S. Seung, Manfred Opper, and Haim Sompolinsky.
Query by committee.
In Computational Learing Theory, pages 287-294, citeseer.nj.nec.com/seung92query.html, 1992.

76
Yoram Singer and Manfred Warmuth.
A new parameter estimation method for Gaussian mixtures.
In Michael S. Kearns, Sara A. Solla, and David A. Cohn, editors, NIPS, volume 11. The MIT Press, 1998.

77
A.J. Smola, P. Bartlett, B. Schölkopf, and C. Schuurmans, editors.
Advances in Large Margin Classifiers, Cambridge, MA, 1999. The MIT Press.

78
Alexander J. Smola and Peter Bartlett.
Sparse greedy Gaussian pocess regression.
In Todd K. Leen, Thomas G. Diettrich, and Volker Tresp, editors, NIPS, volume 13, pages 619-625. The MIT Press, 2001.

79
Alexander J. Smola and Bernhard Schölkopf.
Sparse greedy matrix approximation for machine learning.
In Proceedings of ICML'200, pages 911-918, San Francisco, CA, 2000. Morgan Kaufmann.

80
Alexander J. Smola and Bernhard Schölkopf.
Sparse greedy gaussian process regression.
In Todd K. Leen, Thomas G. Diettrich, and Volker Tresp, editors, NIPS, volume 13, pages 619-625. The MIT press, 2001.

81
Alexander J. Smola and Bernhard Schölkopf.
Learning with Kernels.
MIT Press, 2002.

82
Peter Sollich.
Learning from minimum entropy queries in a large committee machine.
Physical Review, 53:2060-2063, 1996.

83
Peter Sollich.
Probabilistic interpretation and Bayesian methods for Support Vector Machines.
In International Conference on Neural Networks, pages 91-96, Edinburgh, 1999.

84
A Stoffelen and D Anderson.
Ambiguity removal and assimiliation of scatterometer data.
Quarterly Journal of the Royal Meteorological Society, 123:491-518, 1997.

85
A.N. Tikhonov.
Solution of incorrectly formulated problems and the regularization method.
Soviet Math. Dokl., 4:1035-1038, 1963.

86
Michael Tipping.
The Relevance Vector Machine.
In Sara A. Solla, Todd K. Leen, and Klaus-Robert Müller, editors, NIPS, volume 12, pages 652-658. The MIT Press, 2000.

87
Michael E. Tipping.
Sparse bayesian learning and the relevance vector machine.
Journal of Machine Learning Research, 1:211-244, 2001.

88
Michael E. Tipping.
Sparse kernel principal component analysis.
In Todd K. Leen, Thomas G. Diettrich, and Volker Tresp, editors, NIPS, volume 13. The MIT press, 2001.

89
Michael E. Tipping and Christopher M. Bishop.
Probabilistic principal component analysis.
Journal of the Royal Statistical Society, Series B, 61(3):611-622, 1999.

90
Giancarlo Ferrari Trecate, Christopher K. I. Williams, and Manfred Opper.
Finite-dimensional approximation of Gaussian processes.
In Michael S. Kearns, Sara A. Solla, and David A. Cohn, editors, NIPS, volume 11. The MIT Press, 1999.

91
Volker Tresp.
A Bayesian committee machine.
Neural Computation, 12(11):2719-2741, 2000.

92
Vladimir Vapnik.
Three remarks on the Support Vector method of function estimation.
In Schölkopf et al. [71], pages 25-42.

93
Vladimir N. Vapnik.
The Nature of Statistical Learning Theory.
Springer-Verlag, New York, NY, 1995.

94
Pascal Vincent and Yoshua Bengio.
Kernel matching pursuit.
Technical Report 1179, Département d'Informatique et Recherche Opérationnelle, Université de Montréal, 2000.

95
G. Wahba.
Splines Models for Observational Data.
Series in Applied Mathematics, Vol. 59, SIAM, Philadelphia, 1990.

96
Grace Wahba, Xiwu Lin, Fangyu Gao, Dong Xiang, Ronald Klein, and Brabara KLein.
The bias-variance tradeoff and the randomized GACV.
In Michael S. Kearns, Sara A. Solla, and David A. Cohn, editors, NIPS, volume 11, pages 620-626. The MIT Press, 1999.

97
Jason Weston, Alex Gammerman, Mark O. Stitson, Vladimir Vapnik, Volodya Vovk, and Chris Watkins.
Support Vector density estimation.
In Schölkopf et al. [71], pages 293-305.

98
Christopher K. I. Williams.
Prediction with Gaussian processes.
In Michael I. Jordan, editor, Learning in Graphical Models. The MIT Press, 1999.

99
Christopher K. I. Williams and David Barber.
Bayesian classification with Gaussian Processes.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(12):1342-1351, 1998.

100
Christopher K. I. Williams and Carl Edward Rasmussen.
Gaussian processes for regression.
In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, NIPS, volume 8. The MIT Press, 1996.

101
Christopher K. I. Williams and Matthias Seeger.
Using the Nyström method to speed up kernel machines.
In Todd K. Leen, Thomas G. Diettrich, and Volker Tresp, editors, NIPS, volume 13, 2001.

102
Christopher K.W. Williams.
Computation with infinite networks.
In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, NIPS, volume 9. The MIT Press, 1996.

103
Huaiyu Zhu, Christopher K. I. Williams, Richard Rohwer, and Michal Morciniec.
Gaussian regression and optimal finite dimensional linear models.
Technical report, Aston University, 1997.