我有如下客户协议(protocol)的 DataFrame:
rng = pd.date_range('2020-12-01', periods=5, freq='D')
df = pd.DataFrame({ "ID" : ["1", "2", "1", "2", "2"],
"value" : [100, 200, 300, 400, 500],
"status" : ["active", "finished", "active", "finished", "active"],
"Date": rng})
我需要根据上面的 df 计算创建新的 DataFrame:
- New1 = 状态为“有效”的最后一个协议(protocol)的值
- New2 = 状态为“完成”的最后一个协议(protocol)的值
为了更精确,我需要像下面这样创建 df:
最佳答案
尝试使用这么长的时间:
df1 = df.loc[df['status'] == "active"]
df2 = df.loc[df['status'] == "finished"]
df1 = df1.groupby("ID")['value'].last()
df2 = df2.groupby("ID")['value'].last()
IDs = df["ID"].drop_duplicates()
new_df = pd.DataFrame({"ID": IDs, "New1": df1.reindex(IDs).tolist(), "New2": df2.reindex(IDs).tolist()})
print(new_df)
输出:
ID New1 New2
0 1 300 NaN
1 2 500 400.0
关于python - 基于Python Pandas DataFrame中日期的值计算?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65322384/