样本偏度(skewness)与随机变量的偏度及三阶统计量之间的关系和计算估计

偏度skewness

1. 随机变量的偏度定义

随机变量 X {X} X的偏度 γ 1 \gamma_1 γ1为三阶标准矩,标准定义为:

γ 1 = E [ ( X − μ σ ) 3 ] = μ 3 σ 3 = E [ ( X − μ ) 3 ] ( E [ ( X − μ ) 2 ] ) 3 / 2 = κ 3 κ 2 3 / 2 , \gamma_1=\displaystyle E \Big[(\frac{X-\mu}{\sigma})^3\Big]=\frac{\mu_3}{\sigma^3}=\frac{E\Big[(X-\mu)^3\Big]}{\Big(E\Big[(X-\mu)^2\Big]\Big)^{3/2}}=\frac{\kappa_3}{\kappa_2^{3/2}}, γ1=E[(σXμ)3]=σ3μ3=(E[(Xμ)2])3/2E[(Xμ)3]=κ23/2κ3,

其中, μ 3 \mu_3 μ3为随机变量 X {X} X的三阶中心距, σ \sigma σ为随机变量 X {X} X的标准差, E E E是求期望, κ 3 = E [ ( X − μ ) 3 ] \kappa_3=E\Big[(X-\mu)^3\Big] κ3=E[(Xμ)3]为随机变量 X {X} X的三阶累积量, κ 2 = E [ ( X − μ ) 2 ] \kappa_2=E\Big[(X-\mu)^2\Big] κ2=E[(Xμ)2]为随机变量 X {X} X的二阶累积量。

ps:对于随机变量 X {X} X而言,一阶累积量等于期望值 E ( X ) {E(X)} E(X),二阶累积量等于方差 V ( x ) {V(x)} V(x),三阶累积量等于三阶中心矩 S ( x ) {S(x)} S(x),但是四阶以及更高阶的累积量与同阶的中心矩并不相等。

还可以用原点距表示偏度的公式:

γ 1 = E [ ( X − μ σ ) 3 ] = E [ X 3 ] − 3 E [ X 2 ] μ + 3 E [ X ] μ 2 − μ 3 ( E [ ( X − μ ) 2 ] ) 3 / 2 \gamma_1=\displaystyle E \Big[(\frac{X-\mu}{\sigma})^3\Big]=\frac{E[X^3]-3E[X^2]\mu +3E[X]\mu^2 -\mu^3}{\Big(E\Big[(X-\mu)^2\Big]\Big)^{3/2}} γ1=E[(σXμ)3]=(E[(Xμ)2])3/2E[X3]3E[X2]μ+3E[X]μ2μ3

= E [ X 3 ] − 3 E μ [ X 2 ] + 3 μ μ 2 − μ 3 ( E [ ( X − μ ) 2 ] ) 3 / 2 =\displaystyle \frac{E[X^3]-3E\mu[X^2]+3\mu\mu^2 -\mu^3}{\Big(E\Big[(X-\mu)^2\Big]\Big)^{3/2}} =(E[(Xμ)2])3/2E[X3]3Eμ[X2]+3μμ2μ3

= E [ X 3 ] − 3 E μ [ X 2 ] + 2 μ 3 ( E [ ( X − μ ) 2 ] ) 3 / 2 =\displaystyle \frac{E[X^3]-3E\mu[X^2]+2\mu^3}{\Big(E\Big[(X-\mu)^2\Big]\Big)^{3/2}} =(E[(Xμ)2])3/2E[X3]3Eμ[X2]+2μ3

= E [ X 3 ] − 3 μ ( E [ X 2 ] − μ 2 ) − μ 3 ( E [ ( X − μ ) 2 ] ) 3 / 2 = E [ X 3 ] − 3 μ σ 2 − μ 3 σ 3 . =\displaystyle \frac{E[X^3]-3\mu(E[X^2]-\mu^2) -\mu^3}{\Big(E\Big[(X-\mu)^2\Big]\Big)^{3/2}}=\frac{E[X^3]-3\mu \sigma^2-\mu^3}{\sigma^3}. =(E[(Xμ)2])3/2E[X3]3μ(E[X2]μ2)μ3=σ3E[X3]3μσ2μ3.

2. 样本偏度的定义

具有n( n ≥ 3 n\geq 3 n3)个值的样本偏度的定义为:

b 1 = m 3 s 3 = 1 n Σ i = 1 n ( x i − x ˉ ) 3 [ 1 n − 1 Σ i = 1 n ( x i − x ˉ ) 2 ] 3 / 2 , \displaystyle b_1=\frac{m_3}{s^3}=\frac{\frac{1}{n}\Sigma_{i=1}^{n}(x_i-{\bar x})^3}{\Big[\frac{1}{n-1}\Sigma_{i=1}^{n}(x_i-{\bar x})^2\Big]^{3/2}}, b1=s3m3=[n11Σi=1n(xixˉ)2]3/2n1Σi=1n(xixˉ)3,

其中, x ˉ \bar x xˉ为样本的均值, s s s为样本的标准差, m 3 m_3 m3为样本的三阶中心矩。

3. 总体偏度的估计

实际上,在许多文献中,尤其对小样本来说,一个常用的样本偏度的估计,计算公式为:

G 1 = κ 3 κ 2 3 / 2 = n 2 ( n − 1 ) ( n − 2 ) m 3 s 3 = n ( n − 1 ) n − 2 1 n ∑ i = 1 n ( x i − x ˉ ) 3 [ 1 n − 1 ∑ i = 1 n ( x i − x ˉ ) 2 ] 3 / 2 , \displaystyle G_1=\frac{\kappa_3}{\kappa_2^{3/2}}=\frac{n^2}{(n-1)(n-2)}\frac{m_3}{s^3}=\frac{\sqrt{n(n-1)}}{n-2}\frac{\frac{1}{n}\displaystyle\sum_{i=1}^{n}(x_i-{\bar x})^3}{\Big[\frac{1}{n-1}\displaystyle\sum_{i=1}^{n}(x_i-{\bar x})^2\Big]^{3/2}}, G1=κ23/2κ3=(n1)(n2)n2s3m3=n2n(n1) [n11i=1n(xixˉ)2]3/2n1i=1n(xixˉ)3,

其中, κ 3 \kappa_3 κ3为三阶累积量的唯一对称无偏估计量, κ 2 = s 2 \kappa_2=s^2 κ2=s2为二阶累积量(即样本方差)的对称无偏估计量。

加上系数调整后的Fisher-Pearson标准化矩 G 1 {G_1} G1是Excel,Minitab,SAS和SPSS等统计软件及Pandas库所采用的计算公式。

pandas源码片段

def nanskew(values, axis=None, skipna=True, mask=None):
    """ Compute the sample skewness. The statistic computed here is the adjusted Fisher-Pearson standardized moment coefficient G1. The algorithm computes this coefficient directly from the second and third central moment. """
    ......

    mean = values.sum(axis, dtype=np.float64) / count
    if axis is not None:
        mean = np.expand_dims(mean, axis)

    adjusted = values - mean
    if skipna:
        np.putmask(adjusted, mask, 0)
    adjusted2 = adjusted ** 2
    adjusted3 = adjusted2 * adjusted
    m2 = adjusted2.sum(axis, dtype=np.float64)
    m3 = adjusted3.sum(axis, dtype=np.float64)

    # floating point error
    #
    # #18044 in _libs/windows.pyx calc_skew follow this behavior
    # to fix the fperr to treat m2 <1e-14 as zero
    m2 = _zero_out_fperr(m2)
    m3 = _zero_out_fperr(m3)

    with np.errstate(invalid='ignore', divide='ignore'):
        result = (count * (count - 1) ** 0.5 / (count - 2)) * (m3 / m2 ** 1.5)

    .......
        return result

参考资料

  1. Skewness – WikiPedia
  1. Joanes D N, Gill C A. Comparing measures of sample skewness and kurtosis[J]. Journal of the Royal Statistical Society: Series D (The Statistician), 1998, 47(1): 183-189.
  1. binti Yusoff S, Wah Y B. Comparison of conventional measures of skewness and kurtosis for small sample size[C]//2012 International Conference on Statistics in Science, Business and Engineering (ICSSBE). IEEE, 2012: 1-6.
  1. Pebay P P. Formulas for robust, one-pass parallel computation of covariances and arbitrary-order statistical moments[R]. Sandia National Laboratories, 2008.
  1. Online skewness kurtosis computing
  1. Online linear regression computing
  1. Pandas
    原文作者:lppamber
    原文地址: https://blog.csdn.net/u011503666/article/details/109545400
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞