python - 如何将缺失行的记录添加到 Dataframe

我有一个数据框: df

    Name        Date ID  Amount
0   Faye  2018-12-31  A       2
1   Faye  2019-03-31  A       1
2   Faye  2019-06-30  A       5
3   Faye  2019-09-30  B       2
4   Faye  2019-09-30  C       2
5   Faye  2019-12-31  A       4
6   Faye  2020-03-01  A       1
7   Faye  2020-03-01  B       1
8   Mike  2018-12-31  A       4
9   Mike  2019-03-31  A       4
10  Mike  2019-06-30  B       3

对于每个 Name、Date、ID，我计算了 Amount 的百分比变化新列中的上一个 Date。如果没有以前的条目，那么我添加 New:

df['% Change'] = (df.sort_values('Date').groupby(['Name', 'ID']).Amount.pct_change())
df['% Change'] = df['% Change'].fillna('New')

但我还想为相反的情况创建一个条目，其中 Name、Date、ID、group did 以前存在，但在下一个日期不存在；这样输出看起来像:

    Name        Date ID  Amount % Change
0   Faye  2018-12-31  A       2      New
1   Faye  2019-03-31  A       1     -0.5
2   Faye  2019-06-30  A       5        4
3   Faye  2019-09-30  A       0     Sold
4   Faye  2019-09-30  B       2      New
5   Faye  2019-09-30  C       2      New
6   Faye  2019-12-31  A       4      New
7   Faye  2020-03-01  A       1    -0.75
8   Faye  2020-03-01  B       1     -0.5
9   Mike  2018-12-31  A       4      New
10  Mike  2019-03-31  A       4        0
11  Mike  2019-06-30  A       0     Sold
12  Mike  2019-06-30  B       3      New

如果有帮助，我正在尝试模拟 this site 的方式处理此类案件。

最佳答案

解决方法:

# run you original code to ide
df['% Change'] = (df.sort_values('Date').groupby(['Name', 'ID']).Amount.pct_change())
df['% Change'] = df['% Change'].fillna('New')

# Create a dataframe of all te dates. 
all_dates = pd.DataFrame({"Date": df["Date"].unique()}) 
all_dates["one"] = 1

# Create a dasta frame of all the possible recored (all combinations of id-name-date)
name_ids = df[["Name", "ID"]].drop_duplicates()
name_ids["one"] = 1

all_possible_records = pd.merge(all_dates, name_ids, on="one")
all_possible_records = pd.merge(all_possible_records, df, on = ["Date", "Name", "ID"], how ="left")
all_possible_records.drop("one", axis = "columns", inplace = True)
all_possible_records.sort_values(["Name", "ID", "Date"], inplace=True)

# For every record, shift 1 to see if it had any value in the previous querter. 
all_possible_records["prev_q"] = all_possible_records.groupby(["Name", "ID"]).shift(1)["Amount"]

# records in which change is NaN - but there was a value in the previous querter - are 'sold'
all_possible_records.loc[all_possible_records["% Change"].isna() & all_possible_records.prev_q.notna(), "% Change"]="Sold"

# Drop redundent records. 
res = all_possible_records.dropna(axis="rows", subset=["% Change"])
res

结果是:

关于python - 如何将缺失行的记录添加到 Dataframe，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/61843328/

python - 如何将缺失行的记录添加到 Dataframe

上一篇：java - OpenJFX Maven 原型(prototype)不存在？

下一篇：react-native - 拍摄 View 后在 facebook 上分享 - 不在 facebook 上显示图像