假设我有一个在不同键上发生的事件列表.
data = [
{"key": "A", "event": "created"},
{"key": "A", "event": "updated"},
{"key": "A", "event": "updated"},
{"key": "A", "event": "updated"},
{"key": "B", "event": "created"},
{"key": "B", "event": "updated"},
{"key": "B", "event": "updated"},
{"key": "C", "event": "created"},
{"key": "C", "event": "updated"},
{"key": "C", "event": "updated"},
{"key": "C", "event": "updated"},
{"key": "C", "event": "updated"},
{"key": "C", "event": "updated"},
]
df = pandas.DataFrame(data)
我想首先在键上索引我的DataFrame,然后是枚举.它看起来像一个简单的unstack操作,但我无法找到如何正确地执行它.
我能做的最好的是
df.set_index("key", append=True).swaplevel(0, 1)
event
key
A 0 created
1 updated
2 updated
3 updated
B 4 created
5 updated
6 updated
C 7 created
8 updated
9 updated
10 updated
11 updated
12 updated
但我期待的是
event
key
A 0 created
1 updated
2 updated
3 updated
B 0 created
1 updated
2 updated
C 0 created
1 updated
2 updated
3 updated
4 updated
5 updated
我也尝试了类似的东西
df.groupby("key")["key"].count().apply(range).apply(pandas.Series).stack()
但订单未保留,因此我无法将结果应用为索引.此外,我觉得看起来非常标准的操作有点过分了……
任何的想法?
最佳答案 groupby cumcount
以下是几种方法:
# new version thanks @ScottBoston
df = df.set_index(['key', df.groupby('key').cumcount()])\
.rename_axis(['key','count'])
# original version
df = df.assign(count=df.groupby('key').cumcount())\
.set_index(['key', 'count'])
print(df)
event
key count
A 0 created
1 updated
2 updated
3 updated
B 0 created
1 updated
2 updated
C 0 created
1 updated
2 updated
3 updated
4 updated
5 updated