python - 使用条件总和的结果创建 Pandas DataFrame 列

相关于this关于根据条件计算 DataFrame 值的问题，我有一个更复杂的问题，关于包含基于我正在处理的给定行的条件的总和。这是初始 df:

Key UID VID count   month   option  unit    year
0   1   5   100     1       A       10      2015
1   1   5   200     1       B       20      2015
2   1   5   300     2       A       30      2015
3   1   5   400     2       B       40      2015
4   1   7   450     2       B       45      2015
5   1   5   500     3       B       50      2015

我希望迭代这个时间序列 DataFrame，为每一行添加一列“unit_count”，仅在选项为“B”的情况下将该月的“unit”值除以“count”总和。本质上:

df['unit_count'] = df['unit'] / sum of df['count'] for all records containing 'option' 'B' in the same month

这将按如下方式附加 DataFrame:

Key UID VID count   month   option  unit    year    unit_count
0   1   5   100     1       A       10      2015    0.050
1   1   5   200     1       B       20      2015    0.100
2   1   5   300     2       A       30      2015    0.035
3   1   5   400     2       B       40      2015    0.047
4   1   7   450     2       B       45      2015    0.053
5   1   5   500     3       B       50      2015    0.100

上面示例 df 的代码是:

df = pd.DataFrame({'UID':[1,1,1,1,1,1],
                   'VID':[5,5,5,5,7,5],
                'year':[2015,2015,2015,2015,2015,2015],
                'month':[1,1,2,2,2,3],
                'option':['A','B','A','B','B','B'],
                'unit':[10,20,30,40,45,50],
                'count':[100,200,300,400,450,500]
                })

最佳答案

只想查看同一个月，因此您可以按 month 列进行分组，然后在每个组中使用 option == "B" 来子集count 列并求和，使用求和值除 unit 列(逻辑的翻译):

df['unit_count'] = df.groupby('month', group_keys=False).apply(
                      lambda g: g.unit/g['count'][g.option == "B"].sum())
df

关于python - 使用条件总和的结果创建 Pandas DataFrame 列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42011948/

python - 使用条件总和的结果创建 Pandas DataFrame 列

上一篇：python - Pandas Dataframe 中 bool 值的条件前向填充

下一篇：python - 使用 Python 和 Google Sheets API 将工作表移动到特定位置