我有以下数据框:
# Create a dataframe
raw_data = {'trial_num': ['1', '1', '2', '2', '3', '3'],
'area': ['first', 'second', 'first', 'second','first','second'],
'counts': [10, 25, 36, 2, 70, 10]}
df = pd.DataFrame(raw_data, columns = ['trial_num', 'area', 'counts'])
trial_num area count
0 1 first 10
1 1 second 25
2 2 first 36
3 2 second 2
4 3 first 70
5 3 second 10
我想添加一个新列“比例”,将每个计数表示为每个“区域”总数的比例。像这样:
trial_num area count total_count proportion
0 1 first 10 35 0.2857142857142857
1 1 second 25 35 0.7142857142857143
2 2 first 36 38 0.9473684210526315
3 2 second 2 38 0.05263157894736842
4 3 first 70 80 0.875
5 3 second 10 80 0.125
我只做到了这一点:
df.counts.groupby(df.trial_num).sum()
trial_num
1 35
2 38
3 80
有没有一种有效的方法可以在不破坏数据框的情况下做到这一点?请帮忙。
最佳答案
您可以除以div
由 GroupBy.transform
创建的系列
与原始 df
大小相同:
df['proportion'] = df['counts'].div(df.groupby(['trial_num'])['counts'].transform('sum'))
替代方案:map
:
s = df.groupby(['trial_num'])['counts'].sum()
df['proportion'] = df['counts'].div(df['trial_num'].map(s))
<小时/>
print (df)
trial_num area counts proportion
0 1 first 10 0.285714
1 1 second 25 0.714286
2 2 first 36 0.947368
3 2 second 2 0.052632
4 3 first 70 0.875000
5 3 second 10 0.125000
关于python - 如何添加包含行上聚合信息的列?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49449259/