python - 如何将缺失行的记录添加到 Dataframe

标签 python pandas

我有一个数据框: df

    Name        Date ID  Amount
0   Faye  2018-12-31  A       2
1   Faye  2019-03-31  A       1
2   Faye  2019-06-30  A       5
3   Faye  2019-09-30  B       2
4   Faye  2019-09-30  C       2
5   Faye  2019-12-31  A       4
6   Faye  2020-03-01  A       1
7   Faye  2020-03-01  B       1
8   Mike  2018-12-31  A       4
9   Mike  2019-03-31  A       4
10  Mike  2019-06-30  B       3

对于每个 NameDateID,我计算了 Amount 的百分比变化新列中的上一个 Date。如果没有以前的条目,那么我添加 New:

df['% Change'] = (df.sort_values('Date').groupby(['Name', 'ID']).Amount.pct_change())
df['% Change'] = df['% Change'].fillna('New')

但我还想为相反的情况创建一个条目,其中 NameDateID、group did 以前存在,但在下一个日期不存在;这样输出看起来像:

    Name        Date ID  Amount % Change
0   Faye  2018-12-31  A       2      New
1   Faye  2019-03-31  A       1     -0.5
2   Faye  2019-06-30  A       5        4
3   Faye  2019-09-30  A       0     Sold
4   Faye  2019-09-30  B       2      New
5   Faye  2019-09-30  C       2      New
6   Faye  2019-12-31  A       4      New
7   Faye  2020-03-01  A       1    -0.75
8   Faye  2020-03-01  B       1     -0.5
9   Mike  2018-12-31  A       4      New
10  Mike  2019-03-31  A       4        0
11  Mike  2019-06-30  A       0     Sold
12  Mike  2019-06-30  B       3      New

如果有帮助,我正在尝试模拟 this site 的方式处理此类案件。

最佳答案

解决方法:

# run you original code to ide
df['% Change'] = (df.sort_values('Date').groupby(['Name', 'ID']).Amount.pct_change())
df['% Change'] = df['% Change'].fillna('New')

# Create a dataframe of all te dates. 
all_dates = pd.DataFrame({"Date": df["Date"].unique()}) 
all_dates["one"] = 1

# Create a dasta frame of all the possible recored (all combinations of id-name-date)
name_ids = df[["Name", "ID"]].drop_duplicates()
name_ids["one"] = 1

all_possible_records = pd.merge(all_dates, name_ids, on="one")
all_possible_records = pd.merge(all_possible_records, df, on = ["Date", "Name", "ID"], how ="left")
all_possible_records.drop("one", axis = "columns", inplace = True)
all_possible_records.sort_values(["Name", "ID", "Date"], inplace=True)

# For every record, shift 1 to see if it had any value in the previous querter. 
all_possible_records["prev_q"] = all_possible_records.groupby(["Name", "ID"]).shift(1)["Amount"]

# records in which change is NaN - but there was a value in the previous querter - are 'sold'
all_possible_records.loc[all_possible_records["% Change"].isna() & all_possible_records.prev_q.notna(), "% Change"]="Sold"

# Drop redundent records. 
res = all_possible_records.dropna(axis="rows", subset=["% Change"])
res

结果是:

enter image description here

关于python - 如何将缺失行的记录添加到 Dataframe,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61843328/

相关文章:

python - 使用 Scapy 逐步剖析有效负载的每个部分

python - Matplotlib 中的 X 轴

Python 使用箭头更改区域设置

python - Pandas 描述 0.18.0 vs Pandas 描述 0.17.0

python - 为什么 'na_values' 在 Pandas 中不能正常工作?

python - 对行(但不是全部)求和,给其他行默认值

python - 如何在字典上使用 list extend() 和 map 函数

python - 使用生成器在 python 中线程化

python - 用平均值替换缺失值

Python从另一列中提取单词周围的单词