python – Groupby和Pivot Pandas表

2023年5月3日 132次阅读

这应该很快,但我正在做的枢轴/组合工作都没有提出我需要的东西.

我有这样一张桌子：

        Letter  Period  Amount
YrMnth
2014-12      B       6       0
2014-12      C       8       1
2014-12      C       9       2
2014-12      C      10       3
2014-12      C       6       4
2014-12      C      12       5
2014-12      C       7       6
2014-12      C      11       7
2014-12      D       9       8
2014-12      D      10       9
2014-12      D       1      10
2014-12      D       8      11
2014-12      D       6      12
2014-12      D      12      13
2014-12      D       7      14
2014-12      D      11      15
2014-12      D       4      16
2014-12      D       3      17
2015-01      B       7      18
2015-01      B       8      19
2015-01      B       1      20
2015-01      B      10      21
2015-01      B      11      22
2015-01      B       6      23
2015-01      B       9      24
2015-01      B       3      25
2015-01      B       5      26
2015-01      C      10      27

我想转动它,以便索引基本上是YrMonth和Letter,Period是列,Amount是值.

我总体上理解Pivot,但是当我尝试使用多个索引时,我会遇到错误.我把索引作为一个列,并尝试了这个：

In [76]: df.pivot(index=['YrMnth','Letter'], values='Amount', columns='Period')

但我出现了这个错误：

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-76-fc2a4c5f244d> in <module>()
----> 1 df.pivot(index=['YrMnth','Letter'], values='Amount', columns='Period')

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in pivot(self, index, columns, values)
   3761         """
   3762         from pandas.core.reshape import pivot
-> 3763         return pivot(self, index=index, columns=columns, values=values)
   3764
   3765     def stack(self, level=-1, dropna=True):

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/reshape.pyc in pivot(self, index, columns, values)
    331         indexed = Series(self[values].values,
    332                          index=MultiIndex.from_arrays([index,
--> 333                                                        self[columns]]))
    334         return indexed.unstack(columns)
    335

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/series.pyc in __init__(self, data, index, dtype, name, copy, fastpath)
    225                                        raise_cast_failure=True)
    226
--> 227                 data = SingleBlockManager(data, index, fastpath=True)
    228
    229         generic.NDFrame.__init__(self, data, fastpath=True)

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in __init__(self, block, axis, do_integrity_check, fastpath)
   3734             block = make_block(block,
   3735                                placement=slice(0, len(axis)),
-> 3736                                ndim=1, fastpath=True)
   3737
   3738         self.blocks = [block]

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in make_block(values, placement, klass, ndim, dtype, fastpath)
   2452
   2453     return klass(values, ndim=ndim, fastpath=fastpath,
-> 2454                  placement=placement)
   2455
   2456

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in __init__(self, values, placement, ndim, fastpath)
     85             raise ValueError('Wrong number of items passed %d,'
     86                              ' placement implies %d' % (
---> 87                                  len(self.values), len(self.mgr_locs)))
     88
     89     @property

ValueError: Wrong number of items passed 138, placement implies 2

最佳答案如果我理解正确,pivot_table可能更接近您的需求：

df = df.pivot_table(index=["YrMnth", "Letter"], columns="Period", values="Amount")

哪个给你：

Period          1   3   4   5   6   7   8   9   10  11  12
YrMnth  Letter                                            
2014-12 B      NaN NaN NaN NaN   0 NaN NaN NaN NaN NaN NaN
        C      NaN NaN NaN NaN   4   6   1   2   3   7   5
        D       10  17  16 NaN  12  14  11   8   9  15  13
2015-01 B       20  25 NaN  26  23  18  19  24  21  22 NaN
        C      NaN NaN NaN NaN NaN NaN NaN NaN  27 NaN NaN

正如评论中所建议的那样：

 df = pd.pivot_table(df, index=["YrMnth", "Letter"], columns="Period", values="Amount")


Period          1   3   4   5   6   7   8   9   10  11  12
YrMnth  Letter                                            
2014-12 B      NaN NaN NaN NaN   0 NaN NaN NaN NaN NaN NaN
        C      NaN NaN NaN NaN   4   6   1   2   3   7   5
        D       10  17  16 NaN  12  14  11   8   9  15  13
2015-01 B       20  25 NaN  26  23  18  19  24  21  22 NaN
        C      NaN NaN NaN NaN NaN NaN NaN NaN  27 NaN NaN

如果有人想澄清前者如何失败将会很好,也会产生相同的结果.