我有一个数据帧(totaldf),这样:
... Hom ... March Plans March Ships April Plans April Ships ...
0 CAD ... 12 5 4 13
1 USA ... 7 6 2 11
2 CAD ... 4 9 6 14
3 CAD ... 13 3 9 7
... ... ... ... ... ... ...
一年中的所有月份.我希望它是:
... Hom ... Month Plans Ships ...
0 CAD ... March 12 5
1 USA ... March 7 6
2 CAD ... March 4 9
3 CAD ... March 13 3
4 CAD ... April 4 13
5 USA ... April 2 11
6 CAD ... April 6 14
7 CAD ... April 9 7
... ... ... ... ... ...
有没有分割字符串条目的简单方法?
我玩过totaldf.unstack(),但由于有多个列,我不确定如何正确地重新索引数据帧.
最佳答案 您可以使用
pd.wide_to_long
,稍加一些工作以获得正确的存根名称,如文档中所述:
The stub name(s). The wide format variables are assumed to start with the stub names.
因此,有必要稍微修改列名,以便存根名在每个列名的开头:
m = df.columns.str.contains('Plans|Ships')
cols = df.columns[m].str.split(' ')
df.columns.values[m] = [w+month for month, w in cols]
print(df)
Hom PlansMarch ShipsMarch PlansApril ShipsApril
0 CAD 12 5 4 13
1 USA 7 6 2 11
2 CAD 4 9 6 14
3 CAD 13 3 9 7
现在,您可以使用[‘Ships’,’Plans’]作为存根名使用pd.wide_to_long,以获得所需的输出:
((pd.wide_to_long(df.reset_index(), stubnames=['Ships', 'Plans'], i = 'index',
j = 'Month', suffix='\w+')).reset_index(drop=True, level=0)
.reset_index())
x Month Hom Ships Plans
0 March CAD 5 12
1 March USA 6 7
2 March CAD 9 4
3 March CAD 3 13
4 April CAD 13 4
5 April USA 11 2
6 April CAD 14 6
7 April CAD 7 9