python – 创建具有相同自相关的置换

2019年7月21日 163次阅读

我的问题类似于
this one,但不同之处在于我需要一个零和一组数组作为输出.我有一个零的原始时间序列和具有高自相关的那些(即,那些是聚类的).对于某些重要性测试,我需要创建具有相同数量的零和1的随机数组.即然而,原始数组的排列也应该保持自相关/与原始相似,因此简单的np.permutation对我没有帮助.

由于我正在进行多项实现,因此我需要一种尽可能快的解决方案.任何帮助深表感谢.

最佳答案根据您提到的问题,您希望将x置换为此类

np.corrcoef(x[0: len(x) - 1], x[1: ])[0][1]

不会改变.

假设序列x由…组成

z1 o1 z2 o2 z3 o3 … zk ok,

其中每个zi是0的序列,并且每个oi是1的序列. (有四种情况,取决于序列是以0还是1开始,以及它是以0还是1结束,但它们原则上都是相同的).

假设p和q是{1,…,k}的每个排列,并考虑序列

zp [1] oq [1] zp [2] oq [2] zp [3] oq [3] … zp [k] oq [k],

也就是说,0s和1s的每个游程长度子序列已在内部置换.

例如,假设原始序列是

0,0,0,1,1,0,1.

然后

0,0,0,1,0,1,1,

是这样的排列,以及

0,1,1,0,0,0,1,

和

0,1,0,0,0,1,1.

执行此排列不会更改相关性：

>在每次运行中,差异是相同的
>运行之间的界限与以前相同

因此,这提供了一种生成不影响相关性的排列的方法. (另外,在最后看到另一种更简单,更有效的方法,可以在许多常见情况下使用.)

我们从函数preprocess开始,它接受序列,并返回一个元组starts_with_zero,0,1,分别表示,

> x是否从0开始
> 0运行
> 1次运行

在代码中,这是

import numpy as np
import itertools

def preprocess(x):
    def find_runs(x, val):
        matches = np.concatenate(([0], np.equal(x, val).view(np.int8), [0]))
        absdiff = np.abs(np.diff(matches))
        ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
        return ranges[:, 1] - ranges[:, 0]

    starts_with_zero = x[0] == 0

    run_lengths_0 = find_runs(x, 0)
    run_lengths_1 = find_runs(x, 1)
    zeros = [np.zeros(l) for l in run_lengths_0]
    ones = [np.ones(l) for l in run_lengths_1]

    return starts_with_zero, zeros, ones

(这个函数借用了this question的答案.)

要使用此功能,您可以这样做,例如,

x = (np.random.uniform(size=10000) > 0.2).astype(int)

starts_with_zero, zeros, ones = preprocess(x)

现在我们编写一个函数来在内部置换0和1运行,并连接结果：

def get_next_permutation(starts_with_zero, zeros, ones):
    np.random.shuffle(zeros)
    np.random.shuffle(ones)

    if starts_with_zero:
        all_ = itertools.izip_longest(zeros, ones, fillvalue=np.array([]))
    else:
        all_ = itertools.izip_longest(ones, zeros, fillvalue=np.array([]))
    all_ = [e for p in all_ for e in p]

    x_tag = np.concatenate(all_)

    return x_tag

要生成另一个排列(具有相同的相关性),您可以使用

x_tag = get_next_permutation(starts_with_zero, zeros, ones)

要生成许多排列,您可以：

starts_with_zero, zeros, ones = preprocess(x)

for i in range(<number of permutations needed):
    x_tag = get_next_permutation(starts_with_zero, zeros, ones)

例

假设我们跑了

x = (np.random.uniform(size=10000) > 0.2).astype(int)
print np.corrcoef(x[0: len(x) - 1], x[1: ])[0][1]

starts_with_zero, zeros, ones = preprocess(x)

for i in range(10):
    x_tag = get_next_permutation(starts_with_zero, zeros, ones)

    print x_tag[: 50]
    print np.corrcoef(x_tag[0: len(x_tag) - 1], x_tag[1: ])[0][1]

然后我们得到：

0.00674330566615
[ 1.  1.  1.  1.  1.  0.  0.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  0.
  1.  1.  0.  1.  1.  1.  1.  0.  1.  1.  0.  0.  1.  0.  1.  1.  1.  1.
  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
0.00674330566615
[ 1.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  0.  1.  1.  0.  1.  1.  1.  1.  1.  1.  0.  0.  1.  0.
  1.  1.  1.  1.  0.  0.  0.  1.  1.  1.  1.  1.  1.  1.]
0.00674330566615
[ 1.  1.  1.  1.  1.  0.  0.  1.  1.  1.  0.  0.  0.  0.  1.  0.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  0.  1.  1.
  1.  1.  1.  1.  1.  1.  0.  1.  0.  0.  1.  1.  1.  0.]
0.00674330566615
[ 1.  1.  1.  1.  0.  1.  0.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  0.
  1.  1.  1.  1.  1.  0.  0.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  0.  0.  1.]
0.00674330566615
[ 1.  1.  1.  1.  0.  0.  0.  0.  1.  1.  0.  1.  1.  0.  0.  1.  0.  1.
  1.  1.  0.  1.  0.  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  0.  0.  1.
  0.  1.  1.  1.  1.  1.  1.  0.  1.  0.  1.  1.  1.  1.]
0.00674330566615
[ 1.  1.  0.  1.  1.  1.  0.  0.  1.  1.  0.  1.  1.  0.  0.  1.  1.  0.
  1.  1.  1.  0.  1.  1.  1.  1.  0.  0.  0.  1.  1.  1.  1.  1.  1.  1.
  0.  1.  1.  1.  1.  0.  1.  1.  0.  1.  0.  0.  1.  1.]
0.00674330566615
[ 1.  1.  0.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  0.  1.  1.  1.  1.  1.
  1.  1.  0.  1.  0.  1.  1.  0.  1.  0.  1.  1.  1.  1.]
0.00674330566615
[ 1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  0.  1.  1.  0.  1.  0.  1.  1.
  1.  1.  1.  0.  1.  0.  1.  1.  0.  1.  1.  1.  0.  1.  1.  1.  1.  0.
  0.  1.  1.  1.  0.  1.  1.  0.  1.  1.  0.  1.  1.  1.]
0.00674330566615
[ 1.  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  0.  1.  1.  1.
  0.  1.  1.  1.  1.  1.  1.  0.  1.  1.  0.  1.  1.  1.]
0.00674330566615
[ 1.  1.  0.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.  0.  1.  0.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  0.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  0.  1.  0.  1.  0.  1.  1.  1.  1.  1.  1.  0.]

请注意,如果有一个更简单的解决方案

>你的序列长度为n,
>某个数m具有m << n,和
>米！远远大于你需要的排列数.

在这种情况下,只需将序列分成m(近似)相等的部分,并随机置换它们.如前所述,只有m – 1边界以可能影响相关性的方式发生变化.由于m << ñ,这可以忽略不计. 对于某些数字,假设您有一个包含10000个元素的序列.众所周知,20! = 2432902008176640000可能比您需要的排列要多得多.通过将您的序列分成20个部分并进行排列,您最多可以影响19/10000,并且可能足够小.对于这些尺寸,这是我使用的方法.