我正在尝试使用多索引向 Pandas GroupBy DataFrame 添加一列。该列是分组后公共(public)键的最大值与平均值之差。
这是输入数据框:
Main Reads Test Subgroup
0 1 5 54 1
1 2 2 55 1
2 1 10 56 2
3 2 20 57 3
4 1 7 58 3
代码如下:
import numpy as np
import pandas as pd
df = pd.DataFrame({'Main': [1, 2, 1, 2, 1], 'Reads': [5, 2, 10, 20, 7],\
'Test':range(54,59), 'Subgroup':[1,1,2,3,3]})
result = df.groupby(['Main','Subgroup']).agg({'Reads':[np.max,np.mean]})
这是执行diff
计算之前的变量result
:
Reads
amax mean
Main Subgroup
1 1 5 5
2 10 10
3 7 7
2 1 2 2
3 20 20
接下来,我计算 diff
列:
result['Reads']['diff'] = result['Reads']['amax'] - result['Reads']['mean']
但这里是输出:
/home/userd/test.py:9: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/
...stable/indexing.html#indexing-view-versus-copy
...result['Reads']['diff'] = result['Reads']['amax'] - result['Reads']['mean']
我希望 diff
列与 amax
和 mean
处于同一级别。
有没有办法将列添加到 Pandas 中多索引 GroupBy()
对象的最内层(底部)列索引?
最佳答案
您可以使用元组访问多索引
result[('Reads','diff')] = result[('Reads','amax')] - result[('Reads','mean')]
你得到
Reads
amax mean diff
Main Subgroup
1 1 5 5 0
2 10 10 0
3 7 7 0
2 1 2 2 0
3 20 20 0
关于Python Pandas 将列添加到多索引 GroupBy DataFrame,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44011267/