Noisy Time Series II: Earth Quakes, Black Holes, and Machine Learning

Recently , 7 Italian Scientists have been sentenced in prison for manslaughter for failing to predict an Earthquake in 2009 !

So how in the world would a Machine Learning Scientist predict an Earthquake? You might probably think, just collect all the data you can find, stuff it into Hadoop, and run some supervised machine learning algorithms. Eh…not so much!

What we will do is apply some models from Astronomy and Theoretical Physics to model the process, and see if the techniques developed for detecting weak patterns in Astronomy can be applied to the problem of detecting Earthquakes and other crashes in nature.

Moreover, we will, eventually, see how to convert a highly non-convex optimization problem into a convex (LP) problem. It is going to take me some time to get there…first some motivation

Scale Invariance in Nature:

When we look out at natural phenomena, we see a system of Fractals (and seemingly in Equilibrium). Indeed, many natural phenomena resemble Fractals, such as Trees, Mountains, and Coastlines.

A famous math problem is to compute the length of the Coastline of Britain. The problem is that the shorter your ruler, the longer the coastline seems. If we use a 200km ruler, we measure 2400 km. If we use a 50km one, we get 3400 km. And so on

When this happens, we call this, mathematically, Scale Invariance.

A classic mathematical model for a scale invariant process is Brownian motion, also known as a Wiener Process $W(\mu,\sigma)$ .

A Wiener Process is Scale Invariant

That is, we model the process $f(t)$ with an underlying drift $\mu$ and a volatility $\sigma$ . The governing stochastic differential equation is

$f(t) = \mu dt+\sigma dW$

This describes a power law growth (at rate $\mu$ ), decorated with random fluctuations. Many physical systems exhibit this kind of stochastic scale invariance and growth. Determining if we have power law growth or not is difficult as it is–detecting patterns in this randomness is even harder. But we see the results — the crashes– in nature all the time

How Nature Works: Chaos, Crashes and Critical Phenomena

Nature is not , in fact, in Equilibrium. Chaos, crashes, and critical events occur everywhere. Earthquakes, Avalanches, and other natural disasters threaten us every day. In Theoretical Chemistry & Physics, we call these events Critical Phenomena. It has been proposed [1,2] that these “catastrophic events are ‘‘outliers’’ with statistically different properties than the rest of the population and result from [internal, self-amplifying, cascading] mechanisms” We therefore need a different kind of statistical theory that can deal with inherently non-equilibrium processes near a critical point, such as crashes, phase transitions, and other catastrophic events.

This theory is the Renormalization Group (RG) Theory. [Ken Wilson won the Nobel Prize for this, and was a professor in physics at my undergraduate school, Ohio State] RG theory says that near a critical point $t_{c}$ , like a phase transition or a crash, the dynamics changes dramatically. Random fluctuations start appearing on all time and length scales–hence the term scale invariance. The dynamics will lose its essential stochastic character . The system still follows a power law

$x(t)=A+B(t_{c}-t)^{\alpha}$

but the seemingly random fluctuations are, in fact, governed solutions of the 1-D Renormalization Group equations. A simple model for this, in discrete, physical systems, is called Discrete Scale Invariance, and is governed by

$x(t)=A+B(t_{c}-t)^{\alpha}(1+C\cos(\omega\log(t_{c}-t)+\phi))$

where $t_{c}$ is the time the critical event occurs, and $\omega$ is the natural frequency of the oscillation. (This equation describes the first order solution of the RG flow map near the critical point for a phase transition on a discrete lattice–although we apply it far from the critical point to all sorts of natural phenomena)

If indeed nature displays these Discrete Scale Invariant (DSI) patterns prior to an event like an earthquake, we would hope we could detect them — and do so with enough confidence that we don’t end up in jail.

To predict an Earthquake, we measure the concentration of unusual chemicals in the local groundwater as a function of time [6], leading to a graph of the form

The problem of the scientist, machine learning or otherwise, is to distinguish between random and log-periodic behavior and to predict the critical time $t_{c}$ .

The graph on the left shows a simple fit of the DSI log-periodic function, overlayed on the the data. Is this a good fit? Do we believe this? The problem is to fit this non-linear curve to the data with some confidence. We need to determine the power law exponent $\alpha$ , the frequency of oscillation $\omega$ , and, most importantly, the critical time $t_{c}$ when the Earthquake will occur. (The other parameters can be slaved to these).

A classic time series / Astronomy approach is to detrend the series (fit $\alpha$ first) and then find the best $(\omega,t_{c})$ by examining the Periodogram using LSSA–as explained in our last post.

Because these methods are highly non-convex, it is very difficult to get a good fit !

It turns out that some very sophisticated (and convex) machine learning methods have been developed recently by/for Astronmers to solve a very similar problem–detecting Gravity Waves.

Gravity Waves

General Relativity predicts that when two co-rotating neutron stars collide, they form a Black Hole

and cause a massive space-time vibration, called a Gravity Wave, which looks like

… simple Gravity Waves take the form [7]

$S(t)=A(t_{c}-t)^{-\frac{1}{4}}cos(\omega(t-t_{c})^{\frac{5}{8}}+\phi)$

Here, the collision time is the critical time $t_{c}$ . Gravity waves are very weak signals and we need a very clever approach to detect what is really a wave and what is just noise.

Chirps

Generally speaking, we can classify DSI-type functions as Chirps– a function that oscillates strongly along a slow moving envelope. A Chirp takes the form

$f(t)=A(t)\cos(\gamma\phi(t))$

where the amplitude $A$ and the phase $\phi$ are smoothly varying functions of time, and the degree of oscillation $\lambda$ is large.

Nature Shows the Way

Like crashes, Chirps occur everywhere in Nature. For example, Bats use Chirps as part of their echo-location sonar.

Many modern machine learning methods use clues form nature to build a better detector. The so-called Deep Learning methods, pioneered by Andrew Ng and Google, for detecting cat faces, numbers on houses, etc, are based on our understanding of how the human retina and visual cortex recognizes images.

There are also machine learning methods designed to mimic Bat sonar–the one we are interested in looking at here is called Chirplet Basis Pursuit [8],[9]. It has been specifically designed to detect Gravity Waves–we will try to use it to detect Discrete Scale Invariance (without going to jail). And we will do this using a convex optimization !