我的配偶数据位于不同的行,但每个配偶共享相同的 ID。在某些情况下,这些 ID 位于多行上。当 ID 匹配时,我需要将配偶行移至一列,以便配偶双方共享一行。然后我还需要对值进行求和。
输入
ID Position Title First Last SpTitle SpFirst SpLast Address Value1 Value2 Value3
0 456 HoH Mr. John Doe NaN NaN NaN 123 street 10 NaN 30
1 456 Spouse Mrs. Jane Doe NaN NaN NaN 123 street 10 NaN 30
2 789 HoH Mrs. Jane Doe NaN NaN NaN 456 road 100 200 300
3 789 HoH Mrs. Jane Doe NaN NaN NaN 456 road 400 500 600
4 789 Spouse Mr. John Doe NaN NaN NaN 456 road NaN 10 30
期望的输出
ID Position Title First Last SpTitle SpFirst SpLast Address Value1 Value2 Value3
0 456 HoH Mr. John Doe Mrs. Jane Doe 123 street 20 NaN 60
1 789 HoH Mrs. Jane Doe Mr. John Doe 456 road 500 710 930
有数千行。有些 ID 在数十行中重复。我认为它会使用 groupby
和 agg
,但我似乎无法让它正常工作。
当 ID 匹配时,如何在求和值时将配偶移动到与户主 (HoH) 共享同一行?
这是我到目前为止所拥有的:
import pandas as pd
import numpy as np
# Combine sheets
df = pd.concat(pd.read_excel("C:/Users/Sheet.xlsx", sheet_name=None), ignore_index=True)
# Drop blank IDs
df = df[df['ID'].notna()]
# Insert Spouse columns
df.insert(loc = 10, column='SpTitle', value = '')
df.insert(loc = 11, column='SpFirstName', value = '')
df.insert(loc = 12, column='SpMiddleName', value = '')
df.insert(loc = 13, column='SpLastname', value = '')
df.insert(loc = 14, column='SpBirthDate', value = '')
df.insert(loc = 15, column='SpGender', value = '')
m = df.Position.eq("Spouse")
df.loc[m, ["SpTitle", "SpFirstName", "SpMiddleName", "SpLastName", "SpBirthDate", "SpGender" ]] = df.loc[
m, ["Title", "First Name", "Middle Name", "Last Name", "Gender", "Date of Birth"]
].values
df[["Value 2019", "Value 2020", "Value 2021", "Value 2022", "Fund 2019", "Fund 2020", "Fund 2022", "Fund 2021"]] = df.groupby("ID", as_index=False)[
["Value 2019", "Value 2020", "Value 2021", "Value 2022", "Fund 2019", "Fund 2020", "Fund 2022", "Fund 2021"]
].transform(np.sum, min_count=1)
df[["SpTitle", "SpFirstName", "SpMiddleName", "SpLastName", "SpBirthDate", "SpGender"]] = df.groupby("ID", as_index=False)[
["SpTitle", "SpFirstName", "SpMiddleName", "SpLastName", "SpBirthDate", "SpGender"]
].transform(lambda x: x.ffill().bfill())
df = df[~m].drop_duplicates()
df.to_csv("C:/Users/data.csv", index = False)
最佳答案
您可以在对ID
进行分组后对列进行不同的聚合,最后将具有配偶
的行的值填充到聚合输出中。
df = df.set_index('ID')
spouses = df.loc[df['Position'].eq('Spouse'), ['Title', 'First', 'Last']].values
agg_dict = {col : 'sum' if col in ['Value1', 'Value2', 'Value3'] else 'first' for col in df.columns.tolist()}
out = df.groupby(level=0).agg(agg_dict).reset_index()
out.loc[:, ['SpTitle', 'SpFirst', 'SpLast']] = spouses
print(out)
输出:
ID Position Title First Last SpTitle SpFirst SpLast Address Value1 Value2 Value3
0 456 HoH Mr. John Doe Mrs. Jane Doe 123 street 20.0 0.0 60
1 789 HoH Mrs. Jane Doe Mr. John Doe 456 road 500.0 710.0 930
关于python - 在Python中,如果ID匹配,则将行移动到列,同时对特定列求和,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73806372/