Kernels, Green’s Functions, and Resolvent Operators

Machine Learning uses Kernels. At least it tries when it can.

A Kernel is an operator on a Hilbert Space. Or we might say that it defines the Hilbert Space. Depends on our point of view.

Typically the Kernels in Machine Learning are quite simple , 1-dimensional, analytic functions, such as the Radial Basis Function (RBF) Kernel

$\phi(r)=e^{-(\sigma r)^{2}}$

By analytic we mean that the function can be expressed locally as a convergent power series.

Recall that we can express an RBF Kernel as convergent power series via the Fourier Transform

https://charlesmartin14.wordpress.com/2012/02/06/kernels_part_1

$\langle f|f\rangle=\sum_{n=0}^{\infty}\int\frac{\sigma^{2m}}{2^{m}m!}\Vert(\mathcal{O}^{m}f)(x)\Vert_{L_{2}}dx$

Recall also we showed how to construct a Kernel — here a continuously parameterized metric — as a re-summation of a discrete basis set:

https://charlesmartin14.wordpress.com/2012/09/06/kernels-part-2-affine-quantum-gravity-continued

$\int|\xi\rangle\langle\xi|\frac{dxdy}{\pi}=\sum_{n,m=0}^{\infty}\frac{|n\rangle\langle m|}{\sqrt{n!m!}}\frac{}{}\int\exp-|\xi^{2}|\xi^{n}\xi^{*m}\frac{dxdy}{\pi}$ … $= 1$

Here, I expand on this concept, and show to construct a special kind of Kernel, called the Resolvent Operator (R), from a convergent power series as a re-summation of a simpler Kernel (K) ,

$R\left(x, z;\lambda\right) = \sum^\infty_{n=0} \lambda^n K_{n+1} \left(x, z\right)$

In some later posts, we will show some examples and even try to do some new kinds of machine learning using methods build with the Resolvent Operator. But first, we need to review a little bit about

Kernels, Green’s Functions, and Integral Equations

In Machine Learning, we see the Kernel Trick as a way to expand a dot product

$\langle f'|f \rangle=\int f'(t)K(t,s)f(s)ds$

Generally speaking, Kernels arise in the solution of

Linear Integral Equations

$g(x)=A(x)f(x)+\int_{D}K(x,s)f(s)ds$ , $x\in D$

These equations take 3 forms

$A(x)=0$ for all $x\in D$ Inverse Problems
$A(x)\neq 0$ for all $x\in D$ Mathematical Physics
$A(x)\neq 0$ for some subset of $x\in D$

I spent the first decade of my career solving the second kind and the second decade solving the first kind. I would now like to see if I can apply the second kind to Machine Learning. But first, some notation and review. For simplicity, we consider 1-dimensional integrals, and let $A(x)=1$

Linear Integral Equations of the First Kind:

$g(t)=\int_{a}^{b}K(t,s)f(s)ds$

where the integrations limits (a,b) do not appear inside the integral.

The workhorse of Machine Learning, most practioners model their data using a simple linear inverse equation, perhaps with some simple non-linear Kernel or even a data-dependent one.

Greens Functions

Frequently, we can express the equation in terms of another linear (differential) operator $T$ , such that

$g(t)=(Tf)(x)=\int_{a}^{b}K(t,s)f(s)ds$

Then we refer to the Kernel as the Greens Function $G(x,s)$ ; it is the inverse of $T$

$TG(x,s)=\delta(x,s)$

Example: the Laplacian and the Brownian Bridge Kernel

See: G. E. Fasshauer, Green’s Functions: Taking Another Look at Kernel Approximation, Radial Basis Functions and Splines

We sometimes view of the Graph Laplacian L in Machine Learning as a discrete approximation of the differential operator

$L = \frac{d^{2}}{dx^{2}}$

For convenience, let us restrict ourself to the domain $\Omega=[0,1]$ . We can define a Kernel $K(x,s)$ , called the Brownian Bridge Kernel, such that

$LK(x,s)=-\frac{1}{2}\delta(x,s)$ , with boundary condition $K(0,s)=K(1,s)=0$

it can be shown that

$K(x,s)=x(1-s)$ when $x\leq s$ and $K(x,s)=s(1-x)$ when $x>s$

$K$ is the Greens Function for the Laplacian L, with the specified boundary conditions.

Convolution Operator

When the Kernel $K(x,s)$ or Greens Function $G(x,s)$ only depends on the difference $(x-s)$ , as with the RBF Kernel, we call it a Convolution Operator or an Impulse Response Function

$K(x,s)=K(x-s)$

and we can readily apply techniques such as the Fourier Transform to better understand it (as we did in our first post)

Linear Integral Equations of the Second Kind

$g(x)=f(x)-\lambda\int_{a}^{b}K(x,s)f(s)ds$ where $\lambda$ is a parmeter

The workhorse of Mathematical Physics, this model explicitly incorporates a kind of time or spatial dependence, such as in a time-dependent differential equation. We don’t see these explicitly in Machine Learning, however, there are many approaches to model spatial and time dependencies such as Hidden Markov Models, Conditional Random Fields, Observable Operator Models, etc. We shall look at these in a future blog post.

Is this approach only for time-dependent, Markov processes? The short answer-No. Many time dependent problems can be transformed into time-independent problems, and vice versa. A great example is the Holographic Principle in Quantum Gravity (another post, I promise, soon). So do not be too concerned with time; let us just consider what the final Kernel solution looks like.

The Resolvent Operator

We seek a solution $f(x)$ . Let us expand it as a power series

$f(x)=\sum\lambda^{n}\phi_{n}(x)$

then we have the leading term in the solution ( $\lambda=0$ ) is

$\phi_{0}(x)=g(x)$

and the higher order (n) terms are generated by applying the Kernel n times

$\phi_{1}(x)=\int K(x,z)g(x)dz$

$\phi_{2}(x)=\int K(x,y)\int K(y,z)g(x)dydz=\int K_{2}(x,z)g(x)dz$

$\cdots$

$\phi_{n}(x)=\int K_{n}(x,z)g(z)dz$ where $K_{n}$ is the n-th iterated Kernel

We can now re-sum the applied Kernels to obtain the

$R\left(x, z;\lambda\right) = \sum^\infty_{n=0} \lambda^n K_{n+1} \left(x, z\right)$

If we can find this, we can find our the general solution g from the Resolvent R and the starting function f

$g(x)=\int R\left(x, z;\lambda\right)f(x)dz$

We typically don’t see the Resolvent as an infinite sum. Instead, we assume the sum converges and just write

$R(\lambda,K)=(I-\lambda K)^{-1}$

There we have it. We now see the relation between the Kernel’s of machine learning, used to solve Inverse problems, with the Resolvent Operators in mathematical physics, used to solve Inhomogeneous Integral Equations (of the second kind)

Resolvents in Machine Learning: Kernels on Graphs

An example is the Regularized [Graph] Laplacian in equation (17) of

Smola and Kondor, Kernels on Graphs, http://enpub.fulton.asu.edu/cseml/07spring/kenel_graphs.pdf

$R=(I-\sigma^{2}L)^{-1}$ where $L$ is a Graph Laplacian.

Notice that Smola and Kondor relate their Graph Kernels to Page Rank and HITS. Here we can see a connection: HITS (and to some extend PageRank) is essentially equivalent to finding the principle eigenvector of the associated Markov matrix for the web graph. Solving an eigenvector problem is, in fact, a Linear Integral Equation of the Second Kind (where $g(x)=0$ ), and is frequently solved with the Resolvent methods (I’ll explain in a later post). So it is no surprise to this blogger that the Graph Kernels and Page Rank / HITS are closely related.

In our next post, we look at Graph Laplacians and their usefulness in Spectral Clustering: A Quick Overview.

Later, (much later), we will show to combine the Resolvent Operators with Projection Operators to form an approach that looks a lot like Bayesian modeling for these classes of equations.

Kernels, Green’s Functions, and Resolvent Operators