python - 在Python中,如果ID匹配,则将行移动到列,同时对特定列求和

标签 python pandas dataframe group-by

我的配偶数据位于不同的行,但每个配偶共享相同的 ID。在某些情况下,这些 ID 位于多行上。当 ID 匹配时,我需要将配偶行移至一列,以便配偶双方共享一行。然后我还需要对值进行求和。

输入

   ID  Position  Title  First  Last  SpTitle  SpFirst  SpLast  Address    Value1  Value2  Value3
0  456 HoH       Mr.    John   Doe   NaN      NaN      NaN     123 street  10      NaN     30
1  456 Spouse    Mrs.   Jane   Doe   NaN      NaN      NaN     123 street  10      NaN     30
2  789 HoH       Mrs.   Jane   Doe   NaN      NaN      NaN     456 road    100     200     300
3  789 HoH       Mrs.   Jane   Doe   NaN      NaN      NaN     456 road    400     500     600
4  789 Spouse    Mr.    John   Doe   NaN      NaN      NaN     456 road    NaN     10      30

期望的输出

   ID  Position  Title  First  Last  SpTitle  SpFirst  SpLast  Address    Value1  Value2  Value3
0  456 HoH       Mr.    John   Doe   Mrs.     Jane     Doe     123 street  20      NaN     60
1  789 HoH       Mrs.   Jane   Doe   Mr.      John     Doe     456 road    500     710     930

有数千行。有些 ID 在数十行中重复。我认为它会使用 groupbyagg,但我似乎无法让它正常工作。

当 ID 匹配时,如何在求和值时将配偶移动到与户主 (HoH) 共享同一行?

这是我到目前为止所拥有的:

import pandas as pd
import numpy as np

# Combine sheets
df = pd.concat(pd.read_excel("C:/Users/Sheet.xlsx", sheet_name=None), ignore_index=True)

# Drop blank IDs
df = df[df['ID'].notna()]

# Insert Spouse columns
df.insert(loc = 10, column='SpTitle', value = '')
df.insert(loc = 11, column='SpFirstName', value = '')
df.insert(loc = 12, column='SpMiddleName', value = '')
df.insert(loc = 13, column='SpLastname', value = '')
df.insert(loc = 14, column='SpBirthDate', value = '')
df.insert(loc = 15, column='SpGender', value = '')

m = df.Position.eq("Spouse")

df.loc[m, ["SpTitle", "SpFirstName", "SpMiddleName", "SpLastName", "SpBirthDate", "SpGender" ]] = df.loc[
    m, ["Title", "First Name", "Middle Name", "Last Name", "Gender", "Date of Birth"]
].values

df[["Value 2019", "Value 2020", "Value 2021", "Value 2022", "Fund 2019", "Fund 2020", "Fund 2022", "Fund 2021"]] = df.groupby("ID", as_index=False)[
    ["Value  2019", "Value 2020", "Value 2021", "Value 2022", "Fund 2019", "Fund 2020", "Fund 2022", "Fund 2021"]
].transform(np.sum, min_count=1)

df[["SpTitle", "SpFirstName", "SpMiddleName", "SpLastName", "SpBirthDate", "SpGender"]] = df.groupby("ID", as_index=False)[
    ["SpTitle", "SpFirstName", "SpMiddleName", "SpLastName", "SpBirthDate", "SpGender"]
].transform(lambda x: x.ffill().bfill())

df = df[~m].drop_duplicates()

df.to_csv("C:/Users/data.csv", index = False)

最佳答案

您可以在对ID进行分组后对列进行不同的聚合,最后将具有配偶的行的值填充到聚合输出中。

df = df.set_index('ID')
spouses = df.loc[df['Position'].eq('Spouse'), ['Title', 'First', 'Last']].values
agg_dict = {col : 'sum' if col in ['Value1', 'Value2', 'Value3'] else 'first' for col in df.columns.tolist()}

out = df.groupby(level=0).agg(agg_dict).reset_index()
out.loc[:, ['SpTitle', 'SpFirst', 'SpLast']] = spouses

print(out)

输出:

    ID Position Title First Last SpTitle SpFirst SpLast     Address  Value1  Value2  Value3
0  456      HoH   Mr.  John  Doe    Mrs.    Jane    Doe  123 street    20.0     0.0      60
1  789      HoH  Mrs.  Jane  Doe     Mr.    John    Doe    456 road   500.0   710.0     930

关于python - 在Python中,如果ID匹配,则将行移动到列,同时对特定列求和,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73806372/

相关文章:

python - 如何在 Telegram Bot 中获取用户名?

python - 在 os.walk 期间获取文件大小

python - 仅一个表单字段无法显示数据库中的值

python - 匹配2个不同的数据帧返回值然后进行比较

pandas - Pandas 中两个时间戳之间的差异

python - 替换 pandas 多索引数据框中的值

python - 在 niftynet 上实现迁移学习

python - 是否按行替换并将覆盖字典中的值两次?

python - 计算具有 Nan 值的最频繁组

python - 如何从 GroupBy.apply() 中删除多索引?