python - 如何仅对分组的列进行分组和加法

我当前有以下数据框:我想按探针集列进行分组，并通过将具有相同探针集值的数据相加来获取每个 phchp 主题的单个值

       Probeset  phchp230v2  phchp273v3  phchp367v3
0    1554784_at    0.000000    0.000000    0.000000
1    1554784_at    0.000000    0.000000    0.000000
2     212983_at    0.244668    0.032524    0.113343
3     212983_at    0.022178    0.013750    0.011871
4  1566643_a_at    0.048200    0.089618    0.046528

我正在寻找的是这样的:

       Probeset  phchp230v2  phchp273v3  phchp367v3
0    1554784_at    0           0           0
1    1554784_at    0           0           0 
2     212983_at    0.244668    0.046274    0.125214
3     212983_at    0.244668    0.046274    0.125214      
4  1566643_a_at    0.048200    0.089618    0.046528

我尝试了以下方法但没有成功，它没有正确分组:

for x in df_out:
    if 'phchp' in x:
        df_out[x] = df_out.groupby(['Probeset'])[x].sum()

最佳答案

您可以groupby + transform，然后分配回DataFrame。

df1 = df.groupby('Probeset').transform('sum')
df[df1.columns] = df1

print(df)

       Probeset  phchp230v2  phchp273v3  phchp367v3
0    1554784_at    0.000000    0.000000    0.000000
1    1554784_at    0.000000    0.000000    0.000000
2     212983_at    0.266846    0.046274    0.125214
3     212983_at    0.266846    0.046274    0.125214
4  1566643_a_at    0.048200    0.089618    0.046528

你的循环也不是太远，你只需要使用transform。使用 transform，groupby 聚合的结果将广播到属于该组的所有行，因此它将与 DataFrame 索引对齐。如果没有 transform，groupby 结果具有基于组键的索引，因此如果您有 RangeIndex，则简单分配回 DataFrame 将不会对齐。需要的小改动是:

for x in df:
    if 'phchp' in x:
        df[x] = df.groupby('Probeset')[x].transform('sum')

为了清楚起见，这里是有变换和没有变换的 groupby 结果的差异。

# Index is unique values of `'phchp367v3'`
df.groupby('Probeset')['phchp367v3'].sum()
#Probeset
#1554784_at      0.000000
#1566643_a_at    0.046528
#212983_at       0.250428
#Name: phchp367v3, dtype: float64


# Index is the same as the original DataFrame
df.groupby('Probeset')['phchp367v3'].transform('sum')
#0    0.000000
#1    0.000000
#2    0.250428
#3    0.250428
#4    0.046528
#Name: phchp367v3, dtype: float64

关于python - 如何仅对分组的列进行分组和加法，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/69210675/

python - 如何仅对分组的列进行分组和加法

上一篇：java - JsonTemplateLayout 不提供文件名、行号

下一篇：android - 为房间数据库创建类型转换器时出现问题