如何在pandas python中得到最近除以100的数字



input       output
11700.15    11700
11695.20    11700
11661.00    11700
11630.40    11700
11666.10    11700
11600.30    11700
11600.00    11600
11555.40    11600
11655.20    11600
11699.00    11600
11701.55    11700
11799.44    11700
11604.65    11700
11600.33    11700
11599.65    11600


最佳答案 据我所知,这里没有直观的方法,不涉及显式迭代,这对于numpy和pandas来说并不理想.但是,这个问题的时间复杂度是O(n),这使得它成为numba库的一个很好的目标.这使我们能够提出一个非常有效的解决方案.

关于我的解决方案的一个注意事项,我使用(阈值// 2)//阈值*阈值,与使用np.round(a,decimals = -2)相比看起来冗长.这是由于使用numba的nopython = True,flag的性质,它与np.round函数不兼容.

from numba import jit

def cumsum_with_threshold(arr, threshold):
       Rounds values in an array, propogating the last value seen until
       a cumulative sum reaches a threshold
       :param arr: the array to round and sum
       :param threshold: the point at which to stop propogation
       :return: rounded output array

       s = a.shape[0]
       o = np.empty(s)
       d = a[0]
       r = (a + threshold // 2) // threshold * threshold
       c = 0
       o[0] = r[0]

       for i in range(1, s):
           if np.abs(a[i] - d) > threshold:
               o[i] = r[i]
               d = a[i]
               o[i] = o[i - 1]

       return o


a = df['input'].values
pd.Series(cumsum_with_threshold(a, 100))
0     11700.0
1     11700.0
2     11700.0
3     11700.0
4     11700.0
5     11700.0
6     11600.0
7     11600.0
8     11600.0
9     11600.0
10    11700.0
11    11700.0
12    11700.0
13    11600.0
14    11600.0
dtype: float64


for i in range(1, s):
   if np.abs(a[i] - d) > t:
       o[i] = r[i]
       # OLD d = a[i]
       d = r[i]
       o[i] = o[i - 1]


l = np.random.choice(df['input'].values, 10_000_000)

%timeit cumsum_with_threshold(l, 100)
1.54 µs ± 7.93 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)