我有以下数据帧:
date_time value member
2013-10-09 09:00:00 664639 Jerome
2013-10-09 09:05:00 197290 Hence
2013-10-09 09:10:00 470186 Ann
2013-10-09 09:15:00 181314 Mikka
2013-10-09 09:20:00 969427 Cristy
2013-10-09 09:25:00 261473 James
2013-10-09 09:30:00 003698 Oliver
和我有界限的第二个数据框:
date_start date_end
2013-10-09 09:19:00 2013-10-09 09:25:00
2013-10-09 09:25:00 2013-10-09 09:40:00
所以我需要创建一个新列,我将写两个日期时间点之间的每个间隔的索引:
像:
date_time value member session
2013-10-09 09:00:00 664639 Jerome 1
2013-10-09 09:05:00 197290 Hence 1
2013-10-09 09:10:00 470186 Ann 1
2013-10-09 09:15:00 181314 Mikka 2
2013-10-09 09:20:00 969427 Cristy 2
2013-10-09 09:25:00 261473 James 2
2013-10-09 09:30:00 003698 Oliver 2
以下代码创建列’session’,但不在’session’列中写入会话索引(即bounds dataframe中的行索引),因此不要按时间间隔分隔初始数据帧:
def create_interval():
df['session']=''
for index, row in bounds.iterrows():
s = row['date_start']
e = row['date_end']
mask=(df['date'] > s) & (df['date'] < e)
df.loc[mask]['session']='[index]'
return df
UPDATE
代码绑定[‘date_start’]的问题.searchsorted(df [‘date_time’])没有给出我想要获得的结果,即每个区间的一个索引值:df [‘Session’] = 1为第一个区间, = 2表示秒,依此类推.列会话旨在分隔在边界的date_start和date_end之间存在的不同间隔
我想如果df [‘date_time’]与bounds [‘start_date’]不同,它已经增加了session的索引,这不是我正在寻找的
最佳答案 我假设你想要实际的索引位置(从零开始),你可以在你的’date_time’列上调用apply并调用np.searchsorted来找到它所在的界限df的索引位置:
In [266]:
df['Session'] = df['date_time'].apply(lambda x: np.searchsorted(bounds['date_start'], x)[0])
df
Out[266]:
date_time value member Session
0 2013-10-09 09:00:00 664639 Jerome 0
1 2013-10-09 09:05:00 197290 Hence 0
2 2013-10-09 09:10:00 470186 Ann 0
3 2013-10-09 09:15:00 181314 Mikka 0
4 2013-10-09 09:20:00 969427 Cristy 1
5 2013-10-09 09:25:00 261473 James 1
6 2013-10-09 09:30:00 3698 Oliver 2
编辑
@Jeff已经指出申请是不必要的,当然他是对的,这会更快:
In [293]:
df['session'] = bounds['date_start'].searchsorted(df['date_time'])
df
Out[293]:
date_time value member session
0 2013-10-09 09:00:00 664639 Jerome 0
1 2013-10-09 09:05:00 197290 Hence 0
2 2013-10-09 09:10:00 470186 Ann 0
3 2013-10-09 09:15:00 181314 Mikka 0
4 2013-10-09 09:20:00 969427 Cristy 1
5 2013-10-09 09:25:00 261473 James 1
6 2013-10-09 09:30:00 3698 Oliver 2