python – Unpivot Dataframe w / Partial String

2023年7月13日 298次阅读

我有一个数据帧(totaldf),这样：

           ...     Hom   ...    March Plans   March Ships   April Plans   April Ships   ...

0                  CAD   ...    12              5           4             13
1                  USA   ...    7               6           2             11
2                  CAD   ...    4               9           6             14
3                  CAD   ...    13              3           9             7
...                ...   ...    ...             ...         ...           ...

一年中的所有月份.我希望它是：

           ...     Hom   ...    Month   Plans    Ships    ...

0                  CAD   ...    March    12          5             
1                  USA   ...    March    7           6             
2                  CAD   ...    March    4           9             
3                  CAD   ...    March    13          3
4                  CAD   ...    April    4           13            
5                  USA   ...    April    2           11             
6                  CAD   ...    April    6           14             
7                  CAD   ...    April    9           7
...                ...   ...    ...      ...         ...

有没有分割字符串条目的简单方法？
我玩过totaldf.unstack(),但由于有多个列,我不确定如何正确地重新索引数据帧.

最佳答案您可以使用
pd.wide_to_long,稍加一些工作以获得正确的存根名称,如文档中所述：

The stub name(s). The wide format variables are assumed to start with the stub names.

因此,有必要稍微修改列名,以便存根名在每个列名的开头：

m = df.columns.str.contains('Plans|Ships')
cols = df.columns[m].str.split(' ')
df.columns.values[m] = [w+month for month, w in cols]

print(df)
   Hom  PlansMarch  ShipsMarch  PlansApril  ShipsApril
0  CAD          12           5           4          13
1  USA           7           6           2          11
2  CAD           4           9           6          14
3  CAD          13           3           9           7

现在,您可以使用[‘Ships’,’Plans’]作为存根名使用pd.wide_to_long,以获得所需的输出：

((pd.wide_to_long(df.reset_index(), stubnames=['Ships', 'Plans'], i = 'index', 
                j = 'Month', suffix='\w+')).reset_index(drop=True, level=0)
                .reset_index())

x  Month  Hom  Ships  Plans
0  March  CAD      5     12
1  March  USA      6      7
2  March  CAD      9      4
3  March  CAD      3     13
4  April  CAD     13      4
5  April  USA     11      2
6  April  CAD     14      6
7  April  CAD      7      9