我有一个具有两级列索引的数据框。我需要在两个键(列)上有不同的聚合函数。但是,我的代码收到错误。如何聚合多级数据框中的多个列。
dic1 = {('count', 'N.A.'): {Period('1993-01', 'M'): 0,
Period('1993-02', 'M'): 0,
Period('1993-03', 'M'): 0},
('count', 'No'): {Period('1993-01', 'M'): 1,
Period('1993-02', 'M'): 1,
Period('1993-03', 'M'): 1},
('count', 'Yes'): {Period('1993-01', 'M'): 0,
Period('1993-02', 'M'): 0,
Period('1993-03', 'M'): 0},
('sum', 'N.A.'): {Period('1993-01', 'M'): nan,
Period('1993-02', 'M'): nan,
Period('1993-03', 'M'): nan},
('sum', 'No'): {Period('1993-01', 'M'): 6.5820000000000007,
Period('1993-02', 'M'): 131.1865,
Period('1993-03', 'M'): 133.31049999999999},
('sum', 'Yes'): {Period('1993-01', 'M'): nan,
Period('1993-02', 'M'): nan,
Period('1993-03', 'M'): nan}}
df1 = pd.DataFrame(dic1)
df1.to_timestamp(how='end').groupby(pd.TimeGrouper('A') ).agg(
{'count':['max', 'min', 'median', 'last'] ,
'sum':['mean', 'max' , 'last']} )
error: KeyError: 'sum'
最佳答案
一种巧妙的方法是分别提取所有计数和总和列:
In [11]: agg_dict = {col: ['mean', 'max' , 'median', 'last'] for col in df1.columns[df1.columns.get_level_values(0) == "count"]}
In [12]: agg_dict.update({col: ['mean', 'max' , 'last'] for col in df1.columns[df1.columns.get_level_values(0) == "sum"]})
In [13]: g = df1.to_timestamp(how='end').groupby(pd.TimeGrouper('A') )
In [14]: g.agg(agg_dict)
Out[14]:
sum count
N.A. No Yes N.A. No Yes
mean max last mean max last mean max last mean max median last mean max median last mean max median last
1993-12-31 NaN NaN NaN 90.359667 133.3105 133.3105 NaN NaN NaN 0 0 0 0 1 1 1 1 0 0 0 0
关于python - 多级索引中的聚合,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46837734/