Causality vs Correlation: Granger Causality

One of the most repeated mantra’s of Machine Learning is that

“A Causation is not a Correlation!”

When faced with this statement, I’m never really sure how to respond. After all, the entire point of science is to measure correlations and other signals and determine models that explain their cause and can predict future events.

It is certainly true, however, that if we are naive, we can fool ourselves into seeing patterns that are not really there. This is especially true in financial and econmetric time series, which do not seem to follow any of the simple laws of statistics. In our continuing studies of noisy time series, we do not seek to address “the fundamental philosophical and epistemological question of real causality,” [5] but, rather,

We seek practical methods that can detect a weak signal in noisy time series––and model the underlying ’cause’

Science: the Search for Causation

In a previous post, we looked for specific non-linear models of signals in very noisy data, such as gravity waves and earth quake prediction. Could we find a Gravity Wave or predict an Earthquake by observing a specific non-linear pattern–or did we just have noise?

In Economics and Finance — i.e. on Wall Street — we seek patterns in noisy time series–patterns we can trade. Here, we really need to understand the ’cause’ of the pattern because most financial time series are highly non-stationary and it is quite easy to overtrain just about any model — and lose all our money.

In Chemical Physics, noise abounds, and we have to deal with it explicitly. Be it finding simple models for the Brownian motion of particles floating in water, or wrapping our heads around highly non-equilibrium systems that appear to be dominated by random fluctuations.

In this series of posts, we review a very interesting paper that came out a few fears ago [2] that establishes the relationship between Econometric notions Causality and the Mori & Zwanzig Projection operator formalism of non-Equilibrium statistical mechanics. And perhaps some other models too

establish a deep relationship between Granger Causality and the Fluctuation-Dissipation Theorem, and
see a new test of Granger Causality that, in theory, should work much better in time series dominated by noise.

If time permits, we will also look at some more recent methods, such as the the Thermal Optimal Path Method developed by Sornette et. al. [5], both as a practical tool, and within the context of modern machine learning. See also the Kaggle contest below

Granger Causality and Co-Integration

In 2003, Robert F. Engle (NYU) and Clive W. J. Granger (UC San Diego) won the Nobel Prize in Economics for their development of Statistical Methods for Economic Time Series.

In particular, they developed simple tests to determine if one time series $x(t)$ is being caused by another time series $y(t)$ –even when they not correlated. That is:

Does $Y(t)$ cause $X(t)$ ?

A classic example is to look at a drunk walking her dog. Both the drunk and her dog follow a random path, but they still try to stay close to each other. The paths are not actually correlated. Instead we say the 2 paths are Co-Integrated.

Co-Integration is particularly useful in Pairs Trading [6]

One selects 2 assets that stay close to each other over time enough that they can be successfully traded. The success of Pairs Trading has stimulated the search for sophisticated tests for co-integration. So how does one define –and test–for co-integration?

Any 2 time series are co-integrated if any linear combination of them is stationary

Or, more technically, 2 (or more) time series are co-integrated if they share a common stochastic drift. Granger and Engle developed a simple test for this.

Say we have a time series $x(t)$ such that $x(t+\delta t)$ depends on all previous values of $x(t)$ . Usually we see this expressed as the linear relationship

$x_{t}=g_{1}x_{t-1}+g_{1}x_{t-2}+\cdots+g_{n}x_{t-n}$ + residual

although we can express this more generally,

$X(t+\delta t)=G\circ X(t)+R_{X}(t)$

where $G$ is any linear or possibly non-linear relationship. We then ask what causes $X(t)$ ? Or, rather, if we have another time series $Y(t)$ , we ask

Is X(t) Co-Intgrated with Y(t)?

If we naively regress $x(t)$ against $y(t)$ , we might find a correlation when none exists.

“A Causation is not a Correlation!”

Instead, we propose a relationship between $Y(t)$ and $X(t)$ , such that

$X(t+\delta t)=H_{X}\circ X(t)+H_{Y}\circ Y(t)+R_{X,Y}(t)$

and compare the 2 error functionals:

$E_{X}(t_{a},t_{b})=\intop_{t_{a}}^{t_{b}}dt\left[R_{X}(t),R_{X}(t)\right]$

$E_{X,Y}(t_{a},t_{b})=\intop_{t_{a}}^{t_{b}}dt\left[R_{X,Y}(t),R_{X,Y}(t)\right]$

We say that

$Y(t)$ causes $X(t)$ … in the Granger Sense …

when $E_{X,Y}(t_{a},t_{b})$ is much smaller than $E_{X}(t_{a},t_{b})$

Another way of saying this, which might be more familiar to Machine Learning practitioners, is that we say $Y(t)$ causes $X(t)$ when the future values of $X(t)$ can be better predicted with the histories of both $X(t)$ and $Y(t)$ than just $X(t)$ alone.

Granger 2-step Causality Test

A classic co-integration test is the 2-step Granger test. Here, we test if linear combination of $Y(t)$ and $X(t)$ is stationary.

form a new time series $u(t)$ , which is the difference of the two. this is usually done with some suitable linear regression: $X(t)-\beta Y(t)=u(t)$
apply the Augmented Dickey Fuller (ADF) test on the residuals $u(t)$ to see if it is stationary time series.

The ADF test estimates the regression coefficient $\rho$ of $u_{t+1}$ on $u_{t}$ :

$u_{t+1}=\rho u_{t}+\epsilon$

If the coefficient $\rho\ll 1$ , the residuals are not stationary and we have a co-integrated process; if it is close to 1, the residuals are a stationary and we are not co-integrated. The ADF test is readily available in Matlab[3] and Python [4].

Contest

There is currently an open Kaggle contest on this very subject

Given samples from a pair of variables A, B, find whether A is a cause of B.

http://www.kaggle.com/c/cause-effect-pairs

We provide hundreds of pairs of real variables with known causal relationships from domains as diverse as chemistry, climatology, ecology, economy, engineering, epidemiology, genomics, medicine, physics. and sociology. Those are intermixed with controls (pairs of independent variables and pairs of variables that are dependent but not causally related) and semi-artificial cause-effect pairs (real variables mixed in various ways to produce a given outcome).

it is a bit late to enter, but if anyone wants to make a last hour contribution using one of these methods please feel free to contact me for a collaboration.

Next Steps

Having laid out the basic framework, the next steps are to look at the problem, from the point of of view of non-equilibirum statistical mechanics, and see if any of these old physics ideas are useful in the modern world of machine learning. stay tuned (and anyone wanting to do the contest please contact me)

References

[1] Engle, Robert F., Granger, Clive W. J. (1987) “Co-integration and error correction: Representation, estimation and testing“, Econometrica, 55(2), 251–276.

[2] D.Hsu and M. Hsu (2009) Zwanzig-Mori projection operators and EEG dynamics: deriving a simple equation of motion

[3] MatLab Econometrics Module

[4] Python Statsmodels

[5] Didier Sornette and Wei-Xing Zhou (2004) Non-Parametric Determination of Real-Time Lag Structure between Two Time Series: The ‘Optimal Thermal Causal Path’ Method

see also : http://www.plosone.org/article/info:doi/10.1371/journal.pone.0022794

[6] High Frequency Statistical Arbitrage Via the Optimal Thermal Causal Path

7 Comments

Rick says:

June 25, 2013 at 3:58 pm

A drunk walking her dog, hilarious.

Terrific post, can’t wait to see the rest–connections to statistical mechanics and thermodynamics are always a blast. I really need to get around to taking a serious statistical physics class.

LikeLike

1. charlesmartin14 says:
  
  June 25, 2013 at 4:01 pm
  
  Thanks.
  
  LikeLike
  
Lei says:

July 21, 2013 at 7:07 pm

Isn’t the direction of the requirement for co-integration wrong? I thought it should be E_x(t_a, t_b) >> E_x,y(t_a, t_b). Maybe I misunderstood it.

LikeLike

1. charlesmartin14 says:
  
  July 21, 2013 at 7:31 pm
  
  yeah you are right. thanks
  
  LikeLike
  
Madi says:

March 6, 2014 at 2:00 pm

I think this is right:
“If the coefficient rho << 1 , the residuals are stationary and we have a co-integrated process; if it is close to 1, the residuals are not stationary and we are not co-integrated."

LikeLike

1. Niko says:
  
  June 26, 2015 at 6:17 pm
  
  I believe this is correct.
  
  LikeLike
  
samuelandjw says:

February 10, 2015 at 1:57 pm

Reblogged this on Scientific Rants and commented:
Nice intro to co-integration

LikeLike

Causality vs Correlation: Granger Causality

Science: the Search for Causation

Granger Causality and Co-Integration

Is X(t) Co-Intgrated with Y(t)?

Granger 2-step Causality Test

Contest

Next Steps

References

Published by Charles H Martin, PhD

7 Comments

Leave a reply to charlesmartin14 Cancel reply

Science: the Search for Causation

Granger Causality and Co-Integration

Is X(t) Co-Intgrated with Y(t)?

Granger 2-step Causality Test

Contest

Next Steps

References

Share this:

Related

Published by Charles H Martin, PhD

7 Comments

Leave a reply to charlesmartin14 Cancel reply