我需要一些使用pandas数据框的帮助.
这是数据框:
group col1 col2 name
1 dog 40 canidae
1 dog 40 canidae
1 dog 40 canidae
1 dog 40 canidae
1 dog 40
1 dog 40 canidae
1 dog 40 canidae
2 frog 85 dendrobatidae
2 frog 89 leptodactylidae
2 frog 89 leptodactylidae
2 frog 82 leptodactylidae
2 frog 89
2 frog 81
2 frog 89 dendrobatidae
3 horse 87 equidae1
3 donkey 76 equidae2
3 zebra 67 equidae3
4 bird 54 psittacidae
4 bird 56
4 bird 34
5 bear 67
5 bear 54
我想要的是添加一个列“consensus_name”获取:
group col1 col2 name consensus_name
1 dog 40 canidae canidae
1 dog 40 canidae canidae
1 dog 40 canidae
1 dog 40 canidae canidae
1 dog 40 canidae canidae
2 frog 85 dendrobatidae leptodactylidae
2 frog 89 leptodactylidae leptodactylidae
2 frog 89 leptodactylidae leptodactylidae
2 frog 82 leptodactylidae leptodactylidae
2 frog 89 leptodactylidae
2 frog 81 leptodactylidae
2 frog 89 dendrobatidae leptodactylidae
3 horse 87 equidae1 equidae3
3 donkey 76 equidae2 equidae3
3 zebra 67 equidae3 equidae3
4 bird 54 psittacidae psittacidae
4 bird 56 psittacidae
4 bird 34 psittacidae
5 bear 67 NA
5 bear 54 NA
为了获得每个组的新列,我得到了最具代表性的组名.
>对于group1,有4行,名称为’canidae’,另一行没有任何内容,因此对于每一行,我在列共有名称中写’canidae’
>对于group2,有2行名为’dendrobatidae’,2行没有任何东西,3行名称’leptodactylidae’所以对于每一行,我在’aggregate_name’中写’leptodactylidae’.
>对于group3,有3行具有不同的名称,因此没有达成共识,我得到的名称是col2的最低编号,所以我在共列名列中写了“equidae3”.
>对于组4,只有一行有信息,因此它是group4的一致名称,所以我在列共有名称中写了psittacidae.
>对于group5,没有信息,那么只需在consensus_name列中写入NA.
有没有人有任何想法与熊猫一起做?谢谢您帮忙 :)
输出为anky =
group col1 col2 name consensus_name
0 1 dog 40 canidae canidae
1 1 dog 40 canidae canidae
2 1 dog 40 canidae canidae
3 1 dog 40 canidae canidae
4 1 dog 40 NaN canidae
5 1 dog 40 canidae canidae
6 1 dog 40 canidae canidae
7 2 frog 85 dendrobatidae dendrobatidae
8 2 frog 89 leptodactylidae leptodactylidae
9 2 frog 89 leptodactylidae leptodactylidae
10 2 frog 82 leptodactylidae leptodactylidae
11 2 frog 89 NaN leptodactylidae
12 2 frog 81 NaN leptodactylidae
13 2 frog 89 dendrobatidae dendrobatidae
14 3 horse 87 equidae1 equidae1
15 3 donkey 76 equidae2 equidae2
16 3 zebra 67 equidae3 equidae3
17 4 bird 54 psittacidae psittacidae
18 4 bird 56 NaN psittacidae
19 4 bird 34 NaN psittacidae
20 5 bear 67 NaN NaN
21 5 bear 54 NaN NaN
最佳答案 使用pandas.DataFrame.Groupby.Series.transform并将其传递给max函数:
#First fillna with empty string
df.name.fillna('', inplace=True)
df['consensus_name'] = df.groupby('group').name.transform('max')
print(df)
group col1 col2 name consensus_name
0 1 dog 40 canidae canidae
1 1 dog 40 canidae canidae
2 1 dog 40 canidae canidae
3 1 dog 40 canidae canidae
4 1 dog 40 canidae
5 1 dog 40 canidae canidae
6 1 dog 40 canidae canidae
7 2 frog 85 dendrobatidae leptodactylidae
8 2 frog 89 leptodactylidae leptodactylidae
9 2 frog 89 leptodactylidae leptodactylidae
10 2 frog 82 leptodactylidae leptodactylidae
11 2 frog 89 leptodactylidae
12 2 frog 81 leptodactylidae
13 2 frog 89 dendrobatidae leptodactylidae
14 3 horse 87 equidae1 equidae3
15 3 donkey 76 equidae2 equidae3
16 3 zebra 67 equidae3 equidae3
17 4 bird 54 psittacidae psittacidae
18 4 bird 56 psittacidae
19 4 bird 34 psittacidae
20 5 bear 67
21 5 bear 54
指出后编辑通常不适用:
df['name'] = df.groupby('group').name.ffill()
df_group = df.groupby('group').name.apply(lambda x: pd.Series.mode(x, dropna=False)).reset_index()
df_group = df_group[df_group.level_1 == df_group.groupby('group').level_1.transform('max')]
df_group.rename({'name':'consensus_name'},axis=1, inplace=True)
df_final = pd.merge(df, df_group, on='group')
print(df_final)
group col1 col2 name level_1 consensus_name
0 1 dog 40 canidae 0 canidae
1 1 dog 40 canidae 0 canidae
2 1 dog 40 canidae 0 canidae
3 1 dog 40 canidae 0 canidae
4 1 dog 40 canidae 0 canidae
5 1 dog 40 canidae 0 canidae
6 1 dog 40 canidae 0 canidae
7 2 frog 85 dendrobatidae 0 leptodactylidae
8 2 frog 89 leptodactylidae 0 leptodactylidae
9 2 frog 89 leptodactylidae 0 leptodactylidae
10 2 frog 82 leptodactylidae 0 leptodactylidae
11 2 frog 89 leptodactylidae 0 leptodactylidae
12 2 frog 81 leptodactylidae 0 leptodactylidae
13 2 frog 89 dendrobatidae 0 leptodactylidae
14 3 horse 87 equidae1 2 equidae3
15 3 donkey 76 equidae2 2 equidae3
16 3 zebra 67 equidae3 2 equidae3
17 4 bird 54 psittacidae 0 psittacidae
18 4 bird 56 psittacidae 0 psittacidae
19 4 bird 34 psittacidae 0 psittacidae
20 5 bear 67 NaN 0 NaN
21 5 bear 54 NaN 0 NaN