python - Groupby 和 Pivot Pandas 表

标签 python pandas

这应该很快,但我正在做的 pivot/groupby 工作没有一个是我需要的。

我有一个这样的表:

        Letter  Period  Amount
YrMnth
2014-12      B       6       0
2014-12      C       8       1
2014-12      C       9       2
2014-12      C      10       3
2014-12      C       6       4
2014-12      C      12       5
2014-12      C       7       6
2014-12      C      11       7
2014-12      D       9       8
2014-12      D      10       9
2014-12      D       1      10
2014-12      D       8      11
2014-12      D       6      12
2014-12      D      12      13
2014-12      D       7      14
2014-12      D      11      15
2014-12      D       4      16
2014-12      D       3      17
2015-01      B       7      18
2015-01      B       8      19
2015-01      B       1      20
2015-01      B      10      21
2015-01      B      11      22
2015-01      B       6      23
2015-01      B       9      24
2015-01      B       3      25
2015-01      B       5      26
2015-01      C      10      27

我想旋转它,以便索引基本上是 YrMonth 和 Letter,Period 是列,Amount 是值。

我大体上理解 Pivot,但是当我尝试使用多个索引时遇到错误。我将索引设为一列,并尝试了这个:

In [76]: df.pivot(index=['YrMnth','Letter'], values='Amount', columns='Period')

但是我遇到了这个错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-76-fc2a4c5f244d> in <module>()
----> 1 df.pivot(index=['YrMnth','Letter'], values='Amount', columns='Period')

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in pivot(self, index, columns, values)
   3761         """
   3762         from pandas.core.reshape import pivot
-> 3763         return pivot(self, index=index, columns=columns, values=values)
   3764
   3765     def stack(self, level=-1, dropna=True):

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/reshape.pyc in pivot(self, index, columns, values)
    331         indexed = Series(self[values].values,
    332                          index=MultiIndex.from_arrays([index,
--> 333                                                        self[columns]]))
    334         return indexed.unstack(columns)
    335

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/series.pyc in __init__(self, data, index, dtype, name, copy, fastpath)
    225                                        raise_cast_failure=True)
    226
--> 227                 data = SingleBlockManager(data, index, fastpath=True)
    228
    229         generic.NDFrame.__init__(self, data, fastpath=True)

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in __init__(self, block, axis, do_integrity_check, fastpath)
   3734             block = make_block(block,
   3735                                placement=slice(0, len(axis)),
-> 3736                                ndim=1, fastpath=True)
   3737
   3738         self.blocks = [block]

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in make_block(values, placement, klass, ndim, dtype, fastpath)
   2452
   2453     return klass(values, ndim=ndim, fastpath=fastpath,
-> 2454                  placement=placement)
   2455
   2456

/Users/chaseschwalbach/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in __init__(self, values, placement, ndim, fastpath)
     85             raise ValueError('Wrong number of items passed %d,'
     86                              ' placement implies %d' % (
---> 87                                  len(self.values), len(self.mgr_locs)))
     88
     89     @property

ValueError: Wrong number of items passed 138, placement implies 2

最佳答案

如果我理解正确,pivot_table 可能更接近您的需要:

df = df.pivot_table(index=["YrMnth", "Letter"], columns="Period", values="Amount")

这给了你:

Period          1   3   4   5   6   7   8   9   10  11  12
YrMnth  Letter                                            
2014-12 B      NaN NaN NaN NaN   0 NaN NaN NaN NaN NaN NaN
        C      NaN NaN NaN NaN   4   6   1   2   3   7   5
        D       10  17  16 NaN  12  14  11   8   9  15  13
2015-01 B       20  25 NaN  26  23  18  19  24  21  22 NaN
        C      NaN NaN NaN NaN NaN NaN NaN NaN  27 NaN NaN

如评论中所建议的那样:

 df = pd.pivot_table(df, index=["YrMnth", "Letter"], columns="Period", values="Amount")


Period          1   3   4   5   6   7   8   9   10  11  12
YrMnth  Letter                                            
2014-12 B      NaN NaN NaN NaN   0 NaN NaN NaN NaN NaN NaN
        C      NaN NaN NaN NaN   4   6   1   2   3   7   5
        D       10  17  16 NaN  12  14  11   8   9  15  13
2015-01 B       20  25 NaN  26  23  18  19  24  21  22 NaN
        C      NaN NaN NaN NaN NaN NaN NaN NaN  27 NaN NaN

也产生相同的结果,如果有人想弄清楚前者是如何失败的,那就太好了。

关于python - Groupby 和 Pivot Pandas 表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34315837/

相关文章:

python - 如何比较Python字典中的相应项目?

python - 如何在不同目录下运行python文件

python - Jupyter突然挂了怎么办?

python - Windows 上没有适用于 Python 3.5 的 cx_Oracle 吗?

python - 如何在 Pandas 数据框中反转 .astype(str)?

python - 使用 Pandas 计算并绘制计数比率

python - pandas - 选择带有整数列表的索引

python - 您如何将单列系列转换为带标题的单行系列?

Python smtplib 和 smtputf8 修复的国际化错误

python - 在 csv.reader 之后从列(Python Pandas)中获取最早的日期