# 时间序列分析之协整检验

### 协整关系

• 首先，因为或许单个序列是非平稳的，但是通过协整我们可以建立起两个或者多个序列之间的平稳关系，进而充分应用平稳性的性质。
• 其次，可以避免伪回归。如果一组非平稳的时间序列不存在协整关系，那么根据它们构造的回归模型就可能是伪回归。
• 区别变量之间长期均衡关系和短期波动关系。

### Engel-Granger 两步协整检验法

EG检验的方法实际上就是对回归方程的残差进行单位根检验

Engle-Granger提出的两步法的步骤如下：

1、用 OLS 估计协整回归方程，从而得到协整系数：

2、检验 的平稳性，如果 平稳，则 是协整的，否则不成立。对于 平稳性的检验通常用 ADF 检验。

### Johansen Test 协整检验法

————————————————-

### 用 python 代码进行协整检验

``````import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

fig = plt.figure()
ax.plot(range(len(a_price)), a_price)
ax.plot(range(len(b_price)), b_price)
ax.legend(['a','b'])
plt.show()``````

从图中看，两个品种具有很强的相关性，并且都是不稳定的。

``````from statsmodels.tsa.stattools import adfuller

a_price = np.reshape(a_price.values, -1)
a_price_diff = np.diff(a_price)

b_price = np.reshape(b_price.values, -1)
b_price_diff = np.diff(b_price)

(-15.436034211511204, 2.90628134201655e-28, 0, 198, {'1%': -3.4638151713286316, '5%': -2.876250632135043, '10%': -2.574611347821651}, 1165.1556545612445)
(-14.259156751414892, 1.4365811614283181e-26, 0, 198, {'1%': -3.4638151713286316, '5%': -2.876250632135043, '10%': -2.574611347821651}, 1152.4222884399824)``````

coint 函数如下：

``````def coint(y0, y1, trend='c', method='aeg', maxlag=None, autolag='aic',
return_results=None):
"""Test for no-cointegration of a univariate equation

The null hypothesis is no cointegration. Variables in y0 and y1 are
assumed to be integrated of order 1, I(1).

This uses the augmented Engle-Granger two-step cointegration test.
Constant or trend is included in 1st stage regression, i.e. in
cointegrating equation.

**Warning:** The autolag default has changed compared to statsmodels 0.8.
In 0.8 autolag was always None, no the keyword is used and defaults to
'aic'. Use `autolag=None` to avoid the lag search.

Parameters
----------
y1 : array_like, 1d
first element in cointegrating vector
y2 : array_like
remaining elements in cointegrating vector
trend : str {'c', 'ct'}
trend term included in regression for cointegrating equation

* 'c' : constant
* 'ct' : constant and linear trend
* also available quadratic trend 'ctt', and no constant 'nc'

method : string
currently only 'aeg' for augmented Engle-Granger test is available.
default might change.
maxlag : None or int
keyword for `adfuller`, largest or given number of lags
autolag : string
keyword for `adfuller`, lag selection criterion.

* if None, then maxlag lags are used without lag search
* if 'AIC' (default) or 'BIC', then the number of lags is chosen
to minimize the corresponding information criterion
* 't-stat' based choice of maxlag.  Starts with maxlag and drops a
lag until the t-statistic on the last lag length is significant
using a 5%-sized test

return_results : bool
for future compatibility, currently only tuple available.
If True, then a results instance is returned. Otherwise, a tuple
with the test outcome is returned.
Set `return_results=False` to avoid future changes in return.

Returns
-------
coint_t : float
t-statistic of unit-root test on residuals
pvalue : float
MacKinnon's approximate, asymptotic p-value based on MacKinnon (1994)
crit_value : dict
Critical values for the test statistic at the 1 %, 5 %, and 10 %
levels based on regression curve. This depends on the number of
observations.

Notes
-----``````
``````from statsmodels.tsa.stattools import coint

print(coint(a_price, b_price))

(-3.9532731584015215, 0.008362293067615467, array([-3.95232129, -3.36700631, -3.06583125]))
``````

从返回结果可以看出 t-statistic 值要小于1%的置信度，所以有99%的把握拒绝原假设，而且p-value的值也比较小，所以说存在协整关系。

Ref :

《统计套利：理论与实战》金志宏著

原文作者：敲代码的quant
原文地址: https://blog.csdn.net/FrankieHello/article/details/86770852
本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。