我正在尝试使用几个聚合函数来转换 pandas DataFrame,其中一些是 lambda。每列必须有一个不同的名称,以便通过多个 lambda 函数进行聚合。我尝试了一些在网上找到的想法,但都没有奏效。这是最小的例子:
df = pd.DataFrame({'col1': [1, 1, 2, 3], 'col2': [4, 4, 5, 6], 'col3': [7, 10, 8, 9]})
pivoted_df = df.pivot_table(index = ['col1', 'col2'], values = 'col3', aggfunc=[('lam1', lambda x: np.percentile(x, 50)), ('lam2', np.percentile(x, 75)]).reset_index()
错误是
AttributeError: 'SeriesGroupBy' object has no attribute 'lam1'
我尝试使用dictionary
,它也导致错误。有人可以帮忙吗?谢谢!
最佳答案
明确命名函数:
def lam1(x):
return np.percentile(x, 50)
def lam2(x):
return np.percentile(x, 75)
pivoted_df = df.pivot_table(index = ['col1', 'col2'], values = 'col3',
aggfunc=[lam1, lam2]).reset_index()
然后您的聚合系列将被适本地命名:
print(pivoted_df)
col1 col2 lam1 lam2
0 1 4 8.5 9.25
1 2 5 8.0 8.00
2 3 6 9.0 9.00
docs对于 pd.pivot_table
解释原因:
aggfunc : function, list of functions, dict, default numpy.mean
If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves) If dict is passed, the key is column to aggregate and value is function or list of functions
关于python - Pandas .pivot_table : How to name functions for aggregation,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52870521/