Data Spectroscopy: Gaussian Kernels and Harmonic Oscillators

Previously we developed some intuition behind the Radial Basis Function / Gaussian Kernel by looking at its Fourier components. We learned that an RBF Kernel is an infinite sum of Laplacians (spatial derivatives)  and that, as a Regularizer, it selects out smooth solutions for an ill-posed Inverse problem.

We also learned that in physics, we use the machinery of Kernels to describe the Coherent States. Coherent States are an over-complete, continuously varying, infinite basis of non-orthogonal functions that we constructed from an infinite sum of orthogonal, Quantum Harmonic Oscillator (QHO) basis functions.

ho

We examine the Spectral Properties of the Gaussian Kernel; we will find that the eigenfunctions of the Gaussian Kernel are eigenfunctions of the Quantum Harmonic Oscillator (QHO) .

We will then see how to interpret this correspondence with physics by modeling each data point as a super simple quantum mechanical particle. This part work is based on some semi-recent papers by Shi et. al.  As usual, this may take 3-4 blogs.

For now, let’s take a deeper look at the eigenspectra of the Gaussian Kernel.  See Fasshauer & McCourt for details.  Here I present the basic results, fixing the notation so we can compare the Machine Learning result with the physics.

Modeling with a Gaussian Kernel

When we model data with a vector space, and with a Laplacian or a Kernel, we have a choice to model the individual data points x or the distances between points r_{x,z}=x-z .   Usually we model the distances, however, we may always recover a point model using the Spectral properties of our Kernels

K(x,z)=\sum_{n}\lambda_{n}\phi_{n}(x)\phi_{n}(z)

When K(x,z) is a Radial Basis Function (RBF) Kernel, we can express a spectrum of orthogonal basis functions, under some measure \rho(x) such that our functions are square integrable:

Lets consider the one dimensional case

e^{-\epsilon^{2}(x-z)^{2}}=\sum_{n}\lambda_{n}\phi_{n}(x)\phi_{n}(z)

where the functions \phi_{n}(x) are orthonormal in L_{2}(\mathbb{R}^{n},\rho) , under the measure

\rho(x)=\frac{\alpha}{\sqrt{\pi}}e^{-\alpha^{2}x^{2}} , \alpha>0 so that

\int dx\rho(x)\phi_{n}(x)\phi_{n}(x)=1

As with our Coherent States, we change the measure rather than change the eigenfunctions.  We could, of course, equivalently define

\psi_{n}(x)=\sqrt{\rho(x)}\phi_{n}(x)\sim e^{-\frac{1}{2}\alpha^{2}x^{2}}\phi_{n}(x)

Hermite Polynomials: Generating Functions

How can we find the eigenfunctions?  Note that the RBF function is the Generating Function for the Hermite Polynomials.  Let \epsilon=1 .   Then write

e^{-(x-z)^{2}}=e^{-x^{2}+2xz-z^{2}}=e^{-x^{2}}e^{2xz-z^{2}}=e^{-x^{2}}w(x,z)

We can derive all known properties of the Hermite Polynomials from w(x,z)

\dfrac{\partial^{n}}{\partial z^{n}}w(x,z)\vert_{z=0}=H_{n}(x) ,

In particular, we can solve for various integral relations, although usually we can also look up the relations in a table of integrals (i.e. Gradshteyn and Ryzhik).  To derive our result, most papers site equation (39) in the older paper by Zhu et. al. is

\int_{-\infty}^{+\infty}e^{-(x-z)^{2}}H_{n}(\alpha x)dx=\sqrt{\pi}(1-\alpha^{2})^{\frac{n}{2}}H_{n}(\dfrac{\alpha z}{\sqrt{(1-\alpha^{2})}})

although, thanks to the Wikipedia, today we can  just look up Mehler’s formula directly

\sum_{n=0}^\infty \frac{H_n(x)H_n(y)}{n!}\left(\frac u 2\right)^n= \frac 1 {\sqrt{1-u^2}} \mathrm{e}^{\frac{2u}{1+u}x y-\frac{u^2}{1-u^2}(x-y)^2}

This gives us some insight into the spectrum of the RBF Kernel; we may expect the eigenfunctions to be some scaled Hermite polynomials, under the proper measure.  We will need to work though all the messy details.  Instead, I will present the central result and some older and newer references.   Then take a look at some results from actual calculation

Orthogonal Spectrum of the RBF Kernel: Hermite Polynomials

Here I present some older results; newer relations are found in the 2 papers by Shi et. al.

As shown in Fasshauer &  McCourt,  the eigenvalues \lambda_{n} and eigenfunctions \phi_{n} are (where I write n instead of n-1 ).

\lambda_{n}=\sqrt{\dfrac{\alpha^{2}}{\alpha^{2}+\delta^{2}+\epsilon^{2}}}\left(\dfrac{\epsilon^{2}}{\alpha^{2}+\delta^{2}+\epsilon^{2}}\right)^{n}

\phi_{n}(x)=\gamma_{n}e^{-\delta^{2}x^{2}}H_{n}(\alpha\beta x) , n=0,1,\cdots

where  H_{n} are the classical Hermite Polynomials

H_{n}(\alpha\beta x)=(-1)^{n}e^{x^{-2}}\frac{d^{n}}{dx^{n}}e^{x^{2}}

and

\beta=\left(1+\left(\dfrac{2\epsilon}{\alpha}\right)^{2}\right)^{\frac{1}{4}} \gamma_{n}=\sqrt{\dfrac{\beta}{2^{n}n!}} \delta^{2}=\dfrac{\alpha^{2}}{2}\left(\beta^{2}+1\right)

With this result, it is possible to apply the traditional spectral techniques, similar to Spectral Clustering,  to analyze data sets in machine learning.  For example, Shi et. al.  use apply RBF basis functions to create probabilistic mixture models to describe their data .

We will look at this work more closely below; let us first  lay out the intuition for this approach from known results and approaches in quantum physics and quantum chemical specroscopy.

Quantum Mechanical Harmonic Oscillator:

With a little physics review, a change of variables, and some scaling, we will soon see that the eigenfunctions of the RBF Kernel are, in fact, the same as the stationary states of the Quantum Harmonic Oscillator (QHO).

Of course, the eigenvalues /  energy levels are quite different; the RBF eigenvalues approach zero as n increases, whereas the QHO energy levels are evenly spaced.

(Latex is acting up…I’ll get back to writing this up)

Data Spectroscopy: Quantum Mechanical Data Modeling

Imagine if we to consider each data point in $latex\mathbb{R^{n}} $ as some quantum mechanical particle, and suppose we want to compute total probability density of the collection of points.   We do this using techniques of Spectroscopy, meaning we will express the Hamiltonian of the system in a primitive basis and diagonalize it. What basis?  These are simple, bound particles, so pick the simplest basis possible–Harmonic Oscillators.   So each data point is a little QHO, and we seek to model the entire collection.    This is the physical intuition behind the Data Spectroscopy

References:

C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning

Huaiyu Zhu,  Christopher K. I. Williams,  Richard Rohwer & Michal Morciniec,  Gaussian Regression and Optimal Finite Dimensional Linear Models

G. E. Fasshauer & M. J. McCourt† Stable Evaluation of Gaussian RBF Interpolants 

T. Shi, M. Belkin, and B. Yu, DATA SPECTROSCOPY: EIGENSPACES OF CONVOLUTION OPERATORS AND CLUSTERING

T. Shi, M. Belkin, and B. Yu, Data spectroscopy: learning mixture models using eigenspaces of convolution operators

Generating Functionology

Click to access girolami-orthogonal.pdf

4 Comments

Leave a comment