python Pandas : Find Sum of Column Based on Value of Two other Columns

在遍历 variableA 列时，我想生成一个新列，它是 values 的总和，每当 either 中的一行 variableA 或 variableB 等于 variableA 的当前行值。示例数据:

    values    variableA  variableB
  0  134       1             3
  1  12        2             6
  2  43        1             2
  3  54        3             1
  4  16        2             7

每当 variableA 与 variableA 的当前行匹配时，我可以选择 values 的总和，使用:

df.groupby('variableA')['values'].transform('sum')

但每当 variableB 与 variableA 的当前行匹配时选择 values 的总和让我难以理解。我尝试了 .loc，但它似乎不能很好地与 .groupby 配合使用。预期输出如下:

    values    variableA  variableB  result
  0  134       1             3      231
  1  12        2             6      71
  2  43        1             2      231
  3  54        3             1      188
  4  16        2             7      71

谢谢!

最佳答案

使用 numpy 广播的矢量化方法

vars = df[['variableA', 'variableB']].values
matches = (vars[:, None] == vars[:, [0]]).any(-1)

df.assign(result=df['values'].values @ matches)  # @ operator with python 3
# use this for use python 2
# df.assign(result=df['values'].values.dot(matches))

时间测试

关于 python Pandas : Find Sum of Column Based on Value of Two other Columns，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/41708223/

上一篇：Python Pandas groupby 并使用分组计算 ala dplyr 改变一个新列

下一篇：jquery - 无法将 JSON 数据加载到 jQuery sqlalchemy-datatable

相关文章：

python - matplotlib中换行的文本框？

python - Pandas Dataframe 在网页上显示

python - python 中是否有内置的自定义数字格式？

python - 在列表中按行计算 NaN 值

Python SQlite。 LIMIT 变量在循环中不改变值

python - Tfidftransformer 和 Tfidfvectorizer 有什么区别？

python - 如何在不输入所有列名称的情况下使用 pyodbc 执行许多插入？

angular - Angular 中的 else 语句

compilation - 我们可以在 Free Pascal 上使用版本号的编译条件使用 >(大于)或 <(小于)吗

php - 替代条件语法 (if-else) 在 PHP 5.3.0 (xampp) 上失败