我有一个像numpy浮点数组
v = np.array([1.0,1.0,2.0,2.0,2.0,2.0,...])
我需要识别数组中的所有常量段
[{value:1.0,location:0,duration:2},..]
效率是主要指标
最佳答案 这是一种方法 –
def island_props(v):
# Get one-off shifted slices and then compare element-wise, to give
# us a mask of start and start positions for each island.
# Also, get the corresponding indices.
mask = np.concatenate(( [True], v[1:] != v[:-1], [True] ))
loc0 = np.flatnonzero(mask)
# Get the start locations
loc = loc0[:-1]
# The values would be input array indexe by the start locations.
# The lengths woul be the differentiation between start and stop indices.
return v[loc], loc, np.diff(loc0)
样品运行 –
In [143]: v
Out[143]: array([ 1., 1., 2., 2., 2., 2., 5., 2.])
In [144]: value, location, lengths = island_props(v)
In [145]: value
Out[145]: array([ 1., 2., 5., 2.])
In [146]: location
Out[146]: array([0, 2, 6, 7])
In [147]: lengths
Out[147]: array([2, 4, 1, 1])
运行时测试
其他方法 –
import itertools
def MSeifert(a):
return [{'value': k, 'duration': len(list(v))} for k, v in
itertools.groupby(a.tolist())]
def Kasramvd(a):
return np.split(v, np.where(np.diff(v) != 0)[0] + 1)
计时 –
In [156]: v0 = np.array([1.0,1.0,2.0,2.0,2.0,2.0,5.0,2.0])
In [157]: v = np.tile(v0,10000)
In [158]: %timeit MSeifert(v)
...: %timeit Kasramvd(v)
...: %timeit island_props(v)
...:
10 loops, best of 3: 44.7 ms per loop
10 loops, best of 3: 36.1 ms per loop
10000 loops, best of 3: 140 µs per loop