我生成了两个(数百个)df,然后连接起来,然后我想按顺序排列具有相同列D名称的行:
In [120]: df_list[0]
Out[120]:
A B C D
0 0.564678 0.598355 0.606693 MA0835
1 0.066291 0.063587 0.662292 MA0835
2 0.000000 0.000000 0.010758 MA0835
3 0.000000 0.000000 0.097895 MA0835
4 0.000000 0.000000 0.136468 MA0835
In [121]: df_list[1]
Out[121]:
A B C D
0 0.628844 0.614492 0.570333 MA1002
1 0.317790 0.293189 0.239368 MA1002
2 0.000000 0.000000 0.000000 MA1002
3 0.000000 0.000000 0.000000 MA1002
4 0.000000 0.000000 0.000000 MA1002
In [122]: df = pd.concat(df_list[0:2])
In [122]: df
Out[122]:
A B C D
0 0.564678 0.598355 0.606693 MA0835
1 0.066291 0.063587 0.662292 MA0835
2 0.000000 0.000000 0.010758 MA0835
3 0.000000 0.000000 0.097895 MA0835
4 0.000000 0.000000 0.136468 MA0835
0 0.628844 0.614492 0.570333 MA1002
1 0.317790 0.293189 0.239368 MA1002
2 0.000000 0.000000 0.000000 MA1002
3 0.000000 0.000000 0.000000 MA1002
4 0.000000 0.000000 0.000000 MA1002
标准分类产生:
In [125]: df.sort_values('A',ascending=False)
Out[125]:
A B C D
0 0.628844 0.614492 0.570333 MA1002
0 0.564678 0.598355 0.606693 MA0835
1 0.317790 0.293189 0.239368 MA1002
1 0.066291 0.063587 0.662292 MA0835
2 0.000000 0.000000 0.010758 MA0835
3 0.000000 0.000000 0.097895 MA0835
4 0.000000 0.000000 0.136468 MA0835
2 0.000000 0.000000 0.000000 MA1002
3 0.000000 0.000000 0.000000 MA1002
4 0.000000 0.000000 0.000000 MA1002
但是,我想对A进行排序并保持D指定的行分组.这是所需的输出:
A B C D
0 0.628844 0.614492 0.570333 MA1002
1 0.317790 0.293189 0.239368 MA1002
2 0.000000 0.000000 0.000000 MA1002
3 0.000000 0.000000 0.000000 MA1002
4 0.000000 0.000000 0.000000 MA1002
0 0.564678 0.598355 0.606693 MA0835
1 0.066291 0.063587 0.662292 MA0835
2 0.000000 0.000000 0.010758 MA0835
3 0.000000 0.000000 0.097895 MA0835
4 0.000000 0.000000 0.136468 MA0835
我是否需要使用groupby,还是有其他我不熟悉的排序/分组技术?
最佳答案 使用pd.concat中的keys参数
keys = [(df.A.iloc[0], i) for i, df in enumerate(list_of_dfs)]
pd.concat(list_of_dfs, keys=keys) \
.sort_index(ascending=[False, True, True]) \
.reset_index(drop=True)