python-3.x - Pandas :如何根据另一个数据框的值对数据框上的列求和

我是 Pandas 的新手，我正在尝试做以下事情::

我有一个名为 comms 的数据框，其中包含 articleID 和 commentScore(以及其他列)
我有另一个名为 arts 的数据框，其中包含 articleID 列

我需要在 arts 中创建一个名为 articleScore 的新专栏。每篇文章都必须有 articleScore，它是与该文章(相同文章 ID)相关的所有 commentScores 的总和除以 sqrt(n_comms + 1)，其中 n_comms 是具有该特定 ID 的评论数。

我已经做到了，但是效率很低(如下图)

for article in arts:
    n, tempScore = 0
    
    for i, value in comms.iterrows():
        if value['articleID'] == article['articleID']:
            tempScore + = value['commentScore']
            n += 1    
    article['articleScore'] /= math.sqrt(n+1)

编辑:这是我希望发生的事情的示例:

comms:
__________________________
| # | artID | commScore  |
| 0 | 1x5w  |     2      |
| 1 | 77k3  |     1      |
| 2 | 77k3  |    -1      |
| 3 | 3612  |     5      |
| 4 | 1x5w  |     3      |
--------------------------

arts:
___________________________
| # | artID | artScore (?) |
| 0 | 1x5w  |    2.89      |
| 1 | 77k3  |     0        |
| 2 | 3612  |    3.54      |
-------------------------

我需要(创建和)填充 artScore 列。每个 artScore 是 commentScores 的总和，但只是与文章具有相同 artID 的评论除以 sqrt(n+1)。

有人能帮帮我吗？非常感谢!

安德里亚

最佳答案

我认为您可以使用 groupby 然后在 'artID' 上进行合并:

grpd = comms.groupby('artID')
to_merge = grpd.sum().divide(np.sqrt(grpd.count()+1)).reset_index().rename(columns={'commScore': 'artScore'})[['artID', 'artScore']]
arts.merge(to_merge, on='artID')

关于python-3.x - Pandas :如何根据另一个数据框的值对数据框上的列求和，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/67139587/

上一篇：latex - 如何在 Bookdown 中引用书中的特定章节？

下一篇：python - 如何在某些条件下创建 DataFrame 输出？

相关文章：

django - Django 中显示属于创建对象的所有属性值的方法？

python - 如何描述一个Python对象，属性、身份、类型和值之间的关系是什么？

Python混合全局变量和局部变量？

python - 如何过滤 Pandas 数据框中按索引分组的重复行？

python - 从具有两个唯一值的 Pandas 系列返回相反的值

python - IPython 笔记本内核在运行 Kmeans 时死机

Python 数据帧 : how can I return the number of occurrences in a column?

python - 具有多处理功能的 pyserial 给了我一个 ctype 错误

python - 使用 not equal(!=) 计算 float64 或 int64 的频率

python - Pandas 中的自定义聚合表达式