One of the most repeated mantra’s of Machine Learning is that
“A Causation is not a Correlation!”
When faced with this statement, I’m never really sure how to respond. After all, the entire point of science is to measure correlations and other signals and determine models that explain their cause and can predict future events.
It is certainly true, however, that if we are naive, we can fool ourselves into seeing patterns that are not really there. This is especially true in financial and econmetric time series, which do not seem to follow any of the simple laws of statistics. In our continuing studies of noisy time series, we do not seek to address “the fundamental philosophical and epistemological question of real causality,” [5] but, rather,
We seek practical methods that can detect a weak signal in noisy time series––and model the underlying ’cause’
Science: the Search for Causation
In a previous post, we looked for specific non-linear models of signals in very noisy data, such as gravity waves and earth quake prediction. Could we find a Gravity Wave or predict an Earthquake by observing a specific non-linear pattern–or did we just have noise?
In Economics and Finance — i.e. on Wall Street — we seek patterns in noisy time series–patterns we can trade. Here, we really need to understand the ’cause’ of the pattern because most financial time series are highly non-stationary and it is quite easy to overtrain just about any model — and lose all our money.
In Chemical Physics, noise abounds, and we have to deal with it explicitly. Be it finding simple models for the Brownian motion of particles floating in water, or wrapping our heads around highly non-equilibrium systems that appear to be dominated by random fluctuations.
In this series of posts, we review a very interesting paper that came out a few fears ago [2] that establishes the relationship between Econometric notions Causality and the Mori & Zwanzig Projection operator formalism of non-Equilibrium statistical mechanics. And perhaps some other models too
- establish a deep relationship between Granger Causality and the Fluctuation-Dissipation Theorem, and
- see a new test of Granger Causality that, in theory, should work much better in time series dominated by noise.
If time permits, we will also look at some more recent methods, such as the the Thermal Optimal Path Method developed by Sornette et. al. [5], both as a practical tool, and within the context of modern machine learning. See also the Kaggle contest below
Granger Causality and Co-Integration
In 2003, Robert F. Engle (NYU) and Clive W. J. Granger (UC San Diego) won the Nobel Prize in Economics for their development of Statistical Methods for Economic Time Series.
In particular, they developed simple tests to determine if one time series is being caused by another time series
–even when they not correlated. That is:
Does cause
?
A classic example is to look at a drunk walking her dog. Both the drunk and her dog follow a random path, but they still try to stay close to each other. The paths are not actually correlated. Instead we say the 2 paths are Co-Integrated.
Co-Integration is particularly useful in Pairs Trading [6]
One selects 2 assets that stay close to each other over time enough that they can be successfully traded. The success of Pairs Trading has stimulated the search for sophisticated tests for co-integration. So how does one define –and test–for co-integration?
Any 2 time series are co-integrated if any linear combination of them is stationary
Or, more technically, 2 (or more) time series are co-integrated if they share a common stochastic drift. Granger and Engle developed a simple test for this.
Say we have a time series such that
depends on all previous values of
. Usually we see this expressed as the linear relationship
+ residual
although we can express this more generally,
where is any linear or possibly non-linear relationship. We then ask what causes
? Or, rather, if we have another time series
, we ask
Is X(t) Co-Intgrated with Y(t)?
If we naively regress against
, we might find a correlation when none exists.
“A Causation is not a Correlation!”
Instead, we propose a relationship between and
, such that
and compare the 2 error functionals:
We say that
causes
… in the Granger Sense …
when is much smaller than
Another way of saying this, which might be more familiar to Machine Learning practitioners, is that we say causes
when the future values of
can be better predicted with the histories of both
and
than just
alone.
Granger 2-step Causality Test
A classic co-integration test is the 2-step Granger test. Here, we test if linear combination of and
is stationary.
- form a new time series
, which is the difference of the two. this is usually done with some suitable linear regression:
- apply the Augmented Dickey Fuller (ADF) test on the residuals
to see if it is stationary time series.
The ADF test estimates the regression coefficient of
on
:
If the coefficient , the residuals are not stationary and we have a co-integrated process; if it is close to 1, the residuals are a stationary and we are not co-integrated. The ADF test is readily available in Matlab[3] and Python [4].
Contest
There is currently an open Kaggle contest on this very subject
Given samples from a pair of variables A, B, find whether A is a cause of B.
http://www.kaggle.com/c/cause-effect-pairs
We provide hundreds of pairs of real variables with known causal relationships from domains as diverse as chemistry, climatology, ecology, economy, engineering, epidemiology, genomics, medicine, physics. and sociology. Those are intermixed with controls (pairs of independent variables and pairs of variables that are dependent but not causally related) and semi-artificial cause-effect pairs (real variables mixed in various ways to produce a given outcome).
it is a bit late to enter, but if anyone wants to make a last hour contribution using one of these methods please feel free to contact me for a collaboration.
Next Steps
Having laid out the basic framework, the next steps are to look at the problem, from the point of of view of non-equilibirum statistical mechanics, and see if any of these old physics ideas are useful in the modern world of machine learning. stay tuned (and anyone wanting to do the contest please contact me)
References
[1] Engle, Robert F., Granger, Clive W. J. (1987) “Co-integration and error correction: Representation, estimation and testing“, Econometrica, 55(2), 251–276.
[2] D.Hsu and M. Hsu (2009) Zwanzig-Mori projection operators and EEG dynamics: deriving a simple equation of motion
[3] MatLab Econometrics Module
[5] Didier Sornette and Wei-Xing Zhou (2004) Non-Parametric Determination of Real-Time Lag Structure between Two Time Series: The ‘Optimal Thermal Causal Path’ Method
see also : http://www.plosone.org/article/info:doi/10.1371/journal.pone.0022794
[6] High Frequency Statistical Arbitrage Via the Optimal Thermal Causal Path
A drunk walking her dog, hilarious.
Terrific post, can’t wait to see the rest–connections to statistical mechanics and thermodynamics are always a blast. I really need to get around to taking a serious statistical physics class.
LikeLike
Thanks.
LikeLike
Isn’t the direction of the requirement for co-integration wrong? I thought it should be E_x(t_a, t_b) >> E_x,y(t_a, t_b). Maybe I misunderstood it.
LikeLike
yeah you are right. thanks
LikeLike
I think this is right:
“If the coefficient rho << 1 , the residuals are stationary and we have a co-integrated process; if it is close to 1, the residuals are not stationary and we are not co-integrated."
LikeLike
I believe this is correct.
LikeLike
Reblogged this on Scientific Rants and commented:
Nice intro to co-integration
LikeLike