样本偏度(skewness)与随机变量的偏度及三阶统计量之间的关系和计算估计

2024年1月6日 114次阅读来源: lppamber

偏度skewness

1. 随机变量的偏度定义

随机变量 X {X} X的偏度 γ 1 \gamma_1 γ1为三阶标准矩，标准定义为：

γ 1 = E [ ( X − μ σ ) 3 ] = μ 3 σ 3 = E [ ( X − μ ) 3 ] ( E [ ( X − μ ) 2 ] ) 3 / 2 = κ 3 κ 2 3 / 2 , \gamma_1=\displaystyle E \Big[(\frac{X-\mu}{\sigma})^3\Big]=\frac{\mu_3}{\sigma^3}=\frac{E\Big[(X-\mu)^3\Big]}{\Big(E\Big[(X-\mu)^2\Big]\Big)^{3/2}}=\frac{\kappa_3}{\kappa_2^{3/2}}, γ1=E[(σX−μ)3]=σ3μ3=(E[(X−μ)2])3/2E[(X−μ)3]=κ23/2κ3,

其中， μ 3 \mu_3 μ3为随机变量 X {X} X的三阶中心距， σ \sigma σ为随机变量 X {X} X的标准差， E E E是求期望， κ 3 = E [ ( X − μ ) 3 ] \kappa_3=E\Big[(X-\mu)^3\Big] κ3=E[(X−μ)3]为随机变量 X {X} X的三阶累积量， κ 2 = E [ ( X − μ ) 2 ] \kappa_2=E\Big[(X-\mu)^2\Big] κ2=E[(X−μ)2]为随机变量 X {X} X的二阶累积量。

ps：对于随机变量 X {X} X而言，一阶累积量等于期望值 E ( X ) {E(X)} E(X)，二阶累积量等于方差 V ( x ) {V(x)} V(x)，三阶累积量等于三阶中心矩 S ( x ) {S(x)} S(x)，但是四阶以及更高阶的累积量与同阶的中心矩并不相等。

还可以用原点距表示偏度的公式：

γ 1 = E [ ( X − μ σ ) 3 ] = E [ X 3 ] − 3 E [ X 2 ] μ + 3 E [ X ] μ 2 − μ 3 ( E [ ( X − μ ) 2 ] ) 3 / 2 \gamma_1=\displaystyle E \Big[(\frac{X-\mu}{\sigma})^3\Big]=\frac{E[X^3]-3E[X^2]\mu +3E[X]\mu^2 -\mu^3}{\Big(E\Big[(X-\mu)^2\Big]\Big)^{3/2}} γ1=E[(σX−μ)3]=(E[(X−μ)2])3/2E[X3]−3E[X2]μ+3E[X]μ2−μ3

= E [ X 3 ] − 3 E μ [ X 2 ] + 3 μ μ 2 − μ 3 ( E [ ( X − μ ) 2 ] ) 3 / 2 =\displaystyle \frac{E[X^3]-3E\mu[X^2]+3\mu\mu^2 -\mu^3}{\Big(E\Big[(X-\mu)^2\Big]\Big)^{3/2}} =(E[(X−μ)2])3/2E[X3]−3Eμ[X2]+3μμ2−μ3

= E [ X 3 ] − 3 E μ [ X 2 ] + 2 μ 3 ( E [ ( X − μ ) 2 ] ) 3 / 2 =\displaystyle \frac{E[X^3]-3E\mu[X^2]+2\mu^3}{\Big(E\Big[(X-\mu)^2\Big]\Big)^{3/2}} =(E[(X−μ)2])3/2E[X3]−3Eμ[X2]+2μ3

= E [ X 3 ] − 3 μ ( E [ X 2 ] − μ 2 ) − μ 3 ( E [ ( X − μ ) 2 ] ) 3 / 2 = E [ X 3 ] − 3 μ σ 2 − μ 3 σ 3 . =\displaystyle \frac{E[X^3]-3\mu(E[X^2]-\mu^2) -\mu^3}{\Big(E\Big[(X-\mu)^2\Big]\Big)^{3/2}}=\frac{E[X^3]-3\mu \sigma^2-\mu^3}{\sigma^3}. =(E[(X−μ)2])3/2E[X3]−3μ(E[X2]−μ2)−μ3=σ3E[X3]−3μσ2−μ3.

2. 样本偏度的定义

具有n( n ≥ 3 n\geq 3 n≥3)个值的样本偏度的定义为：

b 1 = m 3 s 3 = 1 n Σ i = 1 n ( x i − x ˉ ) 3 [ 1 n − 1 Σ i = 1 n ( x i − x ˉ ) 2 ] 3 / 2 , \displaystyle b_1=\frac{m_3}{s^3}=\frac{\frac{1}{n}\Sigma_{i=1}^{n}(x_i-{\bar x})^3}{\Big[\frac{1}{n-1}\Sigma_{i=1}^{n}(x_i-{\bar x})^2\Big]^{3/2}}, b1=s3m3=[n−11Σi=1n(xi−xˉ)2]3/2n1Σi=1n(xi−xˉ)3,

其中， x ˉ \bar x xˉ为样本的均值， s s s为样本的标准差， m 3 m_3 m3为样本的三阶中心矩。

3. 总体偏度的估计

实际上，在许多文献中，尤其对小样本来说，一个常用的样本偏度的估计，计算公式为：

G 1 = κ 3 κ 2 3 / 2 = n 2 ( n − 1 ) ( n − 2 ) m 3 s 3 = n ( n − 1 ) n − 2 1 n ∑ i = 1 n ( x i − x ˉ ) 3 [ 1 n − 1 ∑ i = 1 n ( x i − x ˉ ) 2 ] 3 / 2 , \displaystyle G_1=\frac{\kappa_3}{\kappa_2^{3/2}}=\frac{n^2}{(n-1)(n-2)}\frac{m_3}{s^3}=\frac{\sqrt{n(n-1)}}{n-2}\frac{\frac{1}{n}\displaystyle\sum_{i=1}^{n}(x_i-{\bar x})^3}{\Big[\frac{1}{n-1}\displaystyle\sum_{i=1}^{n}(x_i-{\bar x})^2\Big]^{3/2}}, G1=κ23/2κ3=(n−1)(n−2)n2s3m3=n−2n(n−1) [n−11i=1∑n(xi−xˉ)2]3/2n1i=1∑n(xi−xˉ)3,

其中， κ 3 \kappa_3 κ3为三阶累积量的唯一对称无偏估计量， κ 2 = s 2 \kappa_2=s^2 κ2=s2为二阶累积量（即样本方差）的对称无偏估计量。

加上系数调整后的Fisher-Pearson标准化矩 G 1 {G_1} G1是Excel，Minitab，SAS和SPSS等统计软件及Pandas库所采用的计算公式。

pandas源码片段

def nanskew(values, axis=None, skipna=True, mask=None):
    """ Compute the sample skewness. The statistic computed here is the adjusted Fisher-Pearson standardized moment coefficient G1. The algorithm computes this coefficient directly from the second and third central moment. """
    ......

    mean = values.sum(axis, dtype=np.float64) / count
    if axis is not None:
        mean = np.expand_dims(mean, axis)

    adjusted = values - mean
    if skipna:
        np.putmask(adjusted, mask, 0)
    adjusted2 = adjusted ** 2
    adjusted3 = adjusted2 * adjusted
    m2 = adjusted2.sum(axis, dtype=np.float64)
    m3 = adjusted3.sum(axis, dtype=np.float64)

    # floating point error
    #
    # #18044 in _libs/windows.pyx calc_skew follow this behavior
    # to fix the fperr to treat m2 <1e-14 as zero
    m2 = _zero_out_fperr(m2)
    m3 = _zero_out_fperr(m3)

    with np.errstate(invalid='ignore', divide='ignore'):
        result = (count * (count - 1) ** 0.5 / (count - 2)) * (m3 / m2 ** 1.5)

    .......
        return result

参考资料

Skewness – WikiPedia

Joanes D N, Gill C A. Comparing measures of sample skewness and kurtosis[J]. Journal of the Royal Statistical Society: Series D (The Statistician), 1998, 47(1): 183-189.

binti Yusoff S, Wah Y B. Comparison of conventional measures of skewness and kurtosis for small sample size[C]//2012 International Conference on Statistics in Science, Business and Engineering (ICSSBE). IEEE, 2012: 1-6.

Pebay P P. Formulas for robust, one-pass parallel computation of covariances and arbitrary-order statistical moments[R]. Sandia National Laboratories, 2008.

Online skewness kurtosis computing

Online linear regression computing

Pandas

    原文作者：lppamber
    原文地址: https://blog.csdn.net/u011503666/article/details/109545400
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。