python – 在大型numpy数组中查找常量子数组

我有一个像numpy浮点数组

v = np.array([1.0,1.0,2.0,2.0,2.0,2.0,...])

我需要识别数组中的所有常量段

[{value:1.0,location:0,duration:2},..]

效率是主要指标

最佳答案 这是一种方法 –

def island_props(v):
    # Get one-off shifted slices and then compare element-wise, to give
    # us a mask of start and start positions for each island.
    # Also, get the corresponding indices.
    mask = np.concatenate(( [True], v[1:] != v[:-1], [True] ))
    loc0 = np.flatnonzero(mask)

    # Get the start locations
    loc = loc0[:-1]

    # The values would be input array indexe by the start locations.
    # The lengths woul be the differentiation between start and stop indices.
    return v[loc], loc, np.diff(loc0)

样品运行 –

In [143]: v
Out[143]: array([ 1.,  1.,  2.,  2.,  2.,  2.,  5.,  2.])

In [144]: value, location, lengths = island_props(v)

In [145]: value
Out[145]: array([ 1.,  2.,  5.,  2.])

In [146]: location
Out[146]: array([0, 2, 6, 7])

In [147]: lengths
Out[147]: array([2, 4, 1, 1])

运行时测试

其他方法 –

import itertools
def MSeifert(a):
    return [{'value': k, 'duration': len(list(v))} for k, v in 
             itertools.groupby(a.tolist())]

def Kasramvd(a):
    return np.split(v, np.where(np.diff(v) != 0)[0] + 1)

计时 –

In [156]: v0 = np.array([1.0,1.0,2.0,2.0,2.0,2.0,5.0,2.0])

In [157]: v = np.tile(v0,10000)

In [158]: %timeit MSeifert(v)
     ...: %timeit Kasramvd(v)
     ...: %timeit island_props(v)
     ...: 
10 loops, best of 3: 44.7 ms per loop
10 loops, best of 3: 36.1 ms per loop
10000 loops, best of 3: 140 µs per loop
点赞