Causality vs Correlation: Granger Causality

One of the most repeated mantra’s of Machine Learning is that

“A Causation is not a Correlation!”

When faced with this statement, I’m never really sure how to respond.  After all, the entire point of science is to measure correlations and other signals and determine models that explain their cause and can predict future events.

It is certainly true, however, that if we are naive, we can fool ourselves into seeing patterns that are not really there.    This is especially true in financial and econmetric time series, which do not seem to follow any of the simple laws of statistics.   In our continuing studies of noisy time series, we do not seek to address “the fundamental philosophical and epistemological question of  real causality,” [5] but, rather,

We seek practical methods that can detect a weak signal  in noisy time seriesand model the underlying ’cause’ 

Science: the Search for Causation 

In a previous post, we looked for specific non-linear models of signals in very noisy data, such as gravity waves and earth quake prediction.  Could we find a Gravity Wave or predict an Earthquake by observing a specific non-linear pattern–or did we just have noise?

In Economics and Finance — i.e. on Wall Street — we seek patterns in noisy time series–patterns we can trade.    Here, we really need to understand the ’cause’ of the pattern because most financial time series are highly non-stationary and it is quite easy to overtrain just about any model — and lose all our money.

In Chemical Physics, noise abounds, and we have to deal with it explicitly. Be it finding simple models for the Brownian motion of particles floating in water, or wrapping our heads around highly non-equilibrium systems that appear to be dominated by random fluctuations.

In this series of posts, we review a very interesting paper that came out a few fears ago [2] that establishes the relationship between Econometric notions Causality and the Mori & Zwanzig Projection operator formalism of non-Equilibrium statistical mechanics. And perhaps some other models too

  • establish a deep relationship between Granger Causality and the Fluctuation-Dissipation Theorem, and
  • see a new test of Granger Causality that, in theory, should work much better in time series dominated by noise.

If time permits, we will also look at some more recent methods, such as the the Thermal Optimal Path Method developed by Sornette et. al. [5], both as a practical tool, and within the context of modern machine learning.  See also the Kaggle contest below

Granger Causality and Co-Integration

In 2003, Robert F. Engle (NYU) and Clive W. J. Granger (UC San Diego) won the Nobel Prize in Economics for their development of Statistical Methods for Economic Time Series.

In particular, they developed simple tests to determine if one time series x(t) is being caused by another time series y(t) –even when they not correlated.  That is:

Does  Y(t)  cause X(t) ?

parisA classic example is to look at a drunk walking her dog.  Both the drunk and her dog follow a random path, but they still try to stay close to each other.   The paths are not actually correlated.  Instead we say the 2 paths are Co-Integrated.    

Co-Integration is particularly useful in Pairs Trading [6]

Pairs Trading

One selects 2 assets that stay close to each other over time enough that they can be successfully traded.  The success of Pairs Trading has stimulated the search for sophisticated tests for co-integration.  So how does one define –and test–for co-integration?

Any 2 time series are co-integrated if any linear combination of them is stationary

Or, more technically, 2 (or more) time series are co-integrated  if they share a common stochastic drift.  Granger and Engle developed a simple test for this.

Say we have a time series x(t) such that x(t+\delta t) depends on all previous values of x(t) .  Usually we  see this expressed as the linear relationship

x_{t}=g_{1}x_{t-1}+g_{1}x_{t-2}+\cdots+g_{n}x_{t-n} + residual

although we can express this more generally,

X(t+\delta t)=G\circ X(t)+R_{X}(t)

where G is any linear or possibly non-linear relationship.  We then ask what causes  X(t) ?  Or, rather, if we have another time series Y(t) , we ask

 Is X(t) Co-Intgrated with Y(t)?

If we naively regress x(t) against y(t) , we might find a correlation when none exists.

“A Causation is not a Correlation!”

  

Instead, we propose a relationship between Y(t) and X(t) , such that

X(t+\delta t)=H_{X}\circ X(t)+H_{Y}\circ Y(t)+R_{X,Y}(t)

and compare the 2 error functionals:

E_{X}(t_{a},t_{b})=\intop_{t_{a}}^{t_{b}}dt\left[R_{X}(t),R_{X}(t)\right]

E_{X,Y}(t_{a},t_{b})=\intop_{t_{a}}^{t_{b}}dt\left[R_{X,Y}(t),R_{X,Y}(t)\right]

We say that  

Y(t)  causes X(t) … in the Granger Sense …

when   E_{X,Y}(t_{a},t_{b}) is much smaller than E_{X}(t_{a},t_{b}) 

Another way of saying this, which might be more familiar to Machine Learning practitioners, is that we say Y(t)  causes X(t) when the future values of X(t) can be better predicted with the histories of both X(t) and Y(t) than just X(t) alone.

Granger 2-step Causality Test

A classic co-integration test is the 2-step Granger test. Here, we test if linear combination of Y(t)  and X(t)  is stationary.

  1. form a new time series u(t) , which is the difference of the two.  this is usually done with some suitable linear regression: X(t)-\beta Y(t)=u(t)
  2. apply the Augmented Dickey Fuller  (ADF) test on the residuals u(t) to see if it is stationary time series.

The ADF test estimates the regression coefficient \rho of u_{t+1} on u_{t} :

u_{t+1}=\rho u_{t}+\epsilon

If the coefficient  \rho\ll 1 , the residuals are not stationary and we have a co-integrated process; if it is close to 1, the residuals are a stationary and we are not co-integrated.  The ADF test is readily available in Matlab[3] and Python [4].

Contest

There is currently an open Kaggle contest on this very subject

Given samples from a pair of variables A, B, find whether A is a cause of B.

http://www.kaggle.com/c/cause-effect-pairs

We provide hundreds of pairs of real variables with known causal relationships from domains as diverse as chemistry, climatology, ecology, economy, engineering, epidemiology, genomics, medicine, physics. and sociology. Those are intermixed with controls (pairs of independent variables and pairs of variables that are dependent but not causally related) and semi-artificial cause-effect pairs (real variables mixed in various ways to produce a given outcome).

it is a bit late to enter, but if anyone wants to make a last hour contribution using one of these methods please feel free to contact me for a collaboration.

Next Steps

Having laid out the basic framework, the next steps are to look at the problem, from the point of of view of non-equilibirum statistical mechanics, and see if any of these old physics ideas are useful in the modern world of machine learning.  stay tuned (and anyone wanting to do the contest please contact me)

References

[1] Engle, Robert F., Granger, Clive W. J. (1987) “Co-integration and error correction: Representation, estimation and testing“, Econometrica, 55(2), 251–276.

[2] D.Hsu and M. Hsu (2009) Zwanzig-Mori projection operators and EEG dynamics: deriving a simple equation of motion

[3] MatLab Econometrics Module

[4] Python Statsmodels

[5] Didier Sornette and  Wei-Xing Zhou (2004) Non-Parametric Determination of Real-Time Lag Structure between Two Time Series: The ‘Optimal Thermal Causal Path’ Method

see also :  http://www.plosone.org/article/info:doi/10.1371/journal.pone.0022794

[6]  High Frequency Statistical Arbitrage Via the Optimal Thermal Causal Path

7 Comments

  1. A drunk walking her dog, hilarious.

    Terrific post, can’t wait to see the rest–connections to statistical mechanics and thermodynamics are always a blast. I really need to get around to taking a serious statistical physics class.

    Like

  2. Isn’t the direction of the requirement for co-integration wrong? I thought it should be E_x(t_a, t_b) >> E_x,y(t_a, t_b). Maybe I misunderstood it.

    Like

  3. I think this is right:
    “If the coefficient rho << 1 , the residuals are stationary and we have a co-integrated process; if it is close to 1, the residuals are not stationary and we are not co-integrated."

    Like

Leave a comment