test = pd.DataFrame({'injury':['A', 'B', 'B', 'A', 'A', 'C', 'A', 'B', 'A'], 'crash_drinking':[1, 1, 1, 0, 0, 0, 1, 0, 1], 'crash_drugs':[0,0,0,1,1,0,0,1,1], 'driver_drinking':[1,1,0,0,0,0,0,1,0], 'driver_drugged':[0,0,0,0,1,0,0,1,0]})
crash_drinking crash_drugs driver_drinking driver_drugged injury
0 1 0 1 0 A
1 1 0 1 0 B
2 1 0 0 0 B
3 0 1 0 0 A
4 0 1 0 1 A
5 0 0 0 0 C
6 1 0 0 0 A
7 0 1 1 1 B
8 1 1 0 0 A
我希望我的输出看起来像这样(更改列名以区别于上面的数据帧):
drinking crash drinking driver in crash drugged crash drugged driver in crash
A 2 1 2 1
B 2 1 1 0
对于第一行,“伤害”=“A”,并且以下过滤器已到位:
“drink crash”是crash_drinking = 1和crash_drugs = 0的计数;
“崩溃中的饮用驱动程序”是crash_drinking = 1,crash_drugs = 0,driver_drinking = 1,driver_drugs为0的地方.
“吸毒崩溃”是crash_drinking = 0和crash_drugs = 1的地方;
“崩溃中的药物驱动程序”是crash_drinking = 0,crash_drugs = 1,driver_drinking = 0和driver_drugs = 1的地方.
对于行B也是如此,除了“伤害”=“B”的地方.
现在我只是设置了一堆.loc过滤器:
test.loc[(test['injury'] == 'A') & (test['crash_drinking'] == 1) & (test['crash_drugs'] == 0)]
test.loc[(test['injury'] == 'A') & (test['crash_drinking'] == 0) & (test['crash_drugs'] == 1)]
test.loc[(test['injury'] == 'A') & (test['crash_drinking'] == 1) & (test['crash_drugs'] == 0) & (test['driver_drinking'] == 1) & (test['driver_drugged'] == 0)]
等等.
我宁愿通过groupby或.apply()这样做,因为我认为这比循环遍历所有这些查询要快.但我不确定这样做的正确语法.也许我应该在“伤害”专栏上做一个.groupby(),并从那里开始……?
最佳答案
result = pd.DataFrame()
result['drinking crash'] = (test['crash_drinking'] == 1) & (test['crash_drugs'] == 0)
result['drinking driver in crash'] = ((test['crash_drinking'] == 1) & (test['crash_drugs'] == 0)
& (test['driver_drinking'] == 1) & (test['driver_drugs'] == 0))
result['drugged crash'] = (test['crash_drinking'] == 0) & (test['crash_drugs'] == 1)
result['drugged driver in crash'] = ((test['crash_drinking'] == 0) & (test['crash_drugs'] == 1)
& (test['driver_drinking'] == 0) & (test['driver_drugs'] == 1))
result = result.astype(int)
result['injury'] = test['injury']
result.groupby('injury').sum()