python - 仅使用相关列取消堆叠数据框

我有以下数据框:

data = {'year': [2010, 2010, 2011, 2012, 2011, 2012, 2010, 2011, 2012, 2013],
            'store_number': ['1944', '1945', '1946', '1947', '1948', '1949', '1947', '1948', '1949', '1947'],
            'retailer_name': ['Walmart','Walmart', 'CRV', 'CRV', 'CRV', 'Walmart', 'Walmart', 'CRV', 'CRV', 'CRV'],
            'product': ['a', 'b', 'a', 'a', 'b', 'a', 'b', 'a', 'a', 'c'],
            'amount': [5, 5, 8, 6, 1, 5, 10, 6, 12, 11],
            'vat': [0.5, 0.5, 0.8, 0.6, 0.1, 0.5, 0.10, 0.6, 0.12, 0.11]}

    stores = pd.DataFrame(data, columns=['retailer_name', 'store_number', 'year', 'product', 'amount', 'vat'])
    stores.set_index(['retailer_name', 'store_number', 'year', 'product'], inplace=True)
    df = stores.groupby(level=[0, 1, 2, 3]).sum().unstack('product')
    mask = pd.IndexSlice['amount', :]
    df.loc[:, mask] = df.loc[:, mask].fillna(0)

我得到以下输出:

                                amount           vat           
product                              a   b   c     a    b     c
retailer_name store_number year                                
CRV           1946         2011      8   0   0  0.80  NaN   NaN
              1947         2012      6   0   0  0.60  NaN   NaN
                           2013      0   0  11   NaN  NaN  0.11
              1948         2011      6   1   0  0.60  0.1   NaN
              1949         2012     12   0   0  0.12  NaN   NaN
Walmart       1944         2010      5   0   0  0.50  NaN   NaN
              1945         2010      0   5   0   NaN  0.5   NaN
              1947         2010      0  10   0   NaN  0.1   NaN
              1949         2012      5   0   0  0.50  NaN   NaN

我的最终结果中不需要这些 vat 列，如何从我的 unstack 中删除它们？

最佳答案

对我来说有效:

df = stores.groupby(level=[0, 1, 2, 3]).sum().unstack('product')

df = df['amount'].fillna(0)
print (df)
product                             a     b     c
retailer_name store_number year                  
CRV           1946         2011   8.0   0.0   0.0
              1947         2012   6.0   0.0   0.0
                           2013   0.0   0.0  11.0
              1948         2011   6.0   1.0   0.0
              1949         2012  12.0   0.0   0.0
Walmart       1944         2010   5.0   0.0   0.0
              1945         2010   0.0   5.0   0.0
              1947         2010   0.0  10.0   0.0
              1949         2012   5.0   0.0   0.0

一起:

df = stores.groupby(level=[0, 1, 2, 3]).sum().unstack('product')['amount'].fillna(0)
print (df)

product                             a     b     c
retailer_name store_number year                  
CRV           1946         2011   8.0   0.0   0.0
              1947         2012   6.0   0.0   0.0
                           2013   0.0   0.0  11.0
              1948         2011   6.0   1.0   0.0
              1949         2012  12.0   0.0   0.0
Walmart       1944         2010   5.0   0.0   0.0
              1945         2010   0.0   5.0   0.0
              1947         2010   0.0  10.0   0.0
              1949         2012   5.0   0.0   0.0

另一个解决方案是选择 sum 的 amount 列:

df = stores.groupby(level=[0, 1, 2, 3])['amount'].sum().unstack('product').fillna(0)
print (df)
product                             a     b     c
retailer_name store_number year                  
CRV           1946         2011   8.0   0.0   0.0
              1947         2012   6.0   0.0   0.0
                           2013   0.0   0.0  11.0
              1948         2011   6.0   1.0   0.0
              1949         2012  12.0   0.0   0.0
Walmart       1944         2010   5.0   0.0   0.0
              1945         2010   0.0   5.0   0.0
              1947         2010   0.0  10.0   0.0
              1949         2012   5.0   0.0   0.0

关于python - 仅使用相关列取消堆叠数据框，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/37342414/

python - 仅使用相关列取消堆叠数据框

上一篇：python - djangorest框架在create中动态设置外键

下一篇：python - PyMongo 与 Django 和 uwsgi