我有以下数据框:
data = {'year': [2010, 2010, 2011, 2012, 2011, 2012, 2010, 2011, 2012, 2013],
'store_number': ['1944', '1945', '1946', '1947', '1948', '1949', '1947', '1948', '1949', '1947'],
'retailer_name': ['Walmart','Walmart', 'CRV', 'CRV', 'CRV', 'Walmart', 'Walmart', 'CRV', 'CRV', 'CRV'],
'product': ['a', 'b', 'a', 'a', 'b', 'a', 'b', 'a', 'a', 'c'],
'amount': [5, 5, 8, 6, 1, 5, 10, 6, 12, 11],
'vat': [0.5, 0.5, 0.8, 0.6, 0.1, 0.5, 0.10, 0.6, 0.12, 0.11]}
stores = pd.DataFrame(data, columns=['retailer_name', 'store_number', 'year', 'product', 'amount', 'vat'])
stores.set_index(['retailer_name', 'store_number', 'year', 'product'], inplace=True)
df = stores.groupby(level=[0, 1, 2, 3]).sum().unstack('product')
mask = pd.IndexSlice['amount', :]
df.loc[:, mask] = df.loc[:, mask].fillna(0)
我得到以下输出:
amount vat
product a b c a b c
retailer_name store_number year
CRV 1946 2011 8 0 0 0.80 NaN NaN
1947 2012 6 0 0 0.60 NaN NaN
2013 0 0 11 NaN NaN 0.11
1948 2011 6 1 0 0.60 0.1 NaN
1949 2012 12 0 0 0.12 NaN NaN
Walmart 1944 2010 5 0 0 0.50 NaN NaN
1945 2010 0 5 0 NaN 0.5 NaN
1947 2010 0 10 0 NaN 0.1 NaN
1949 2012 5 0 0 0.50 NaN NaN
我的最终结果中不需要这些 vat
列,如何从我的 unstack
中删除它们?
最佳答案
对我来说有效:
df = stores.groupby(level=[0, 1, 2, 3]).sum().unstack('product')
df = df['amount'].fillna(0)
print (df)
product a b c
retailer_name store_number year
CRV 1946 2011 8.0 0.0 0.0
1947 2012 6.0 0.0 0.0
2013 0.0 0.0 11.0
1948 2011 6.0 1.0 0.0
1949 2012 12.0 0.0 0.0
Walmart 1944 2010 5.0 0.0 0.0
1945 2010 0.0 5.0 0.0
1947 2010 0.0 10.0 0.0
1949 2012 5.0 0.0 0.0
一起:
df = stores.groupby(level=[0, 1, 2, 3]).sum().unstack('product')['amount'].fillna(0)
print (df)
product a b c
retailer_name store_number year
CRV 1946 2011 8.0 0.0 0.0
1947 2012 6.0 0.0 0.0
2013 0.0 0.0 11.0
1948 2011 6.0 1.0 0.0
1949 2012 12.0 0.0 0.0
Walmart 1944 2010 5.0 0.0 0.0
1945 2010 0.0 5.0 0.0
1947 2010 0.0 10.0 0.0
1949 2012 5.0 0.0 0.0
另一个解决方案是选择 sum
的 amount
列:
df = stores.groupby(level=[0, 1, 2, 3])['amount'].sum().unstack('product').fillna(0)
print (df)
product a b c
retailer_name store_number year
CRV 1946 2011 8.0 0.0 0.0
1947 2012 6.0 0.0 0.0
2013 0.0 0.0 11.0
1948 2011 6.0 1.0 0.0
1949 2012 12.0 0.0 0.0
Walmart 1944 2010 5.0 0.0 0.0
1945 2010 0.0 5.0 0.0
1947 2010 0.0 10.0 0.0
1949 2012 5.0 0.0 0.0
关于python - 仅使用相关列取消堆叠数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37342414/