我发布了一个问题earlier并得到了答案。
您能解释一下 join final_df = pd.merge(df, temp_df.reset_index(), how="left").fillna(0)
的工作原理吗?我得到了正确的结果,但我不明白连接是如何发生的。 df 和 temp_df 之间没有公共(public)列。
工作代码如下:
d = {'emp': ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'c' ],
'date': ['1', '1', '1', '1', '2', '2', '2', '3', '3', '3', '3' ],
'usd':[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 ],
'expense type':['Car Mileage', 'Car Rental', 'Car Rental - Gas', 'food', 'Car Rental', 'Car Rental - Gas', 'food', 'Car Mileage', 'Car Rental', 'food', 'wine' ],
'zflag':['1', '1', '1', ' ',' ',' ',' ','2','2',' ',' ' ]
}
df = pd.DataFrame(data=d)
df
Out[253]:
date emp expense type usd zflag
0 1 a Car Mileage 1 1
1 1 a Car Rental 2 1
2 1 a Car Rental - Gas 3 1
3 1 a food 4
4 2 b Car Rental 5
5 2 b Car Rental - Gas 6
6 2 b food 7
7 3 c Car Mileage 8 2
8 3 c Car Rental 9 2
9 3 c food 10
10 3 c wine 11
temp_df = df.groupby(["emp", "date"], axis=0)["expense type"].apply(lambda x: 1 if "Car Mileage" in x.values and any([k in x.values for k in ["Car Rental", "Car Rental - Gas"]]) else 0).rename("zzflag")
temp_df = temp_df.loc[temp_df!=0,:].cumsum()
final_df = pd.merge(df, temp_df.reset_index(), how="left").fillna(0)
更新 1:
temp_df
没有索引,它是一个系列。所以我不确定如何按照评论中的建议在索引上进行连接。
temp_df
Out[335]:
emp date
a 1 1
c 3 2
Name: zzflag, dtype: int64
最佳答案
不带 on
或 index
参数的
pd.merge
正在加入公共(public)列名称:
Per Docs in pandas API on pd.merge查看“on”参数:
on : label or list Field names to join on. Must be found in both DataFrames. If on is None and not merging on indexes, then it merges on the intersection of the columns by default.
d = {'emp': ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'c' ],
'date': ['1', '1', '1', '1', '2', '2', '2', '3', '3', '3', '3' ],
'usd':[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 ],
'expense type':['Car Mileage', 'Car Rental', 'Car Rental - Gas', 'food', 'Car Rental', 'Car Rental - Gas', 'food', 'Car Mileage', 'Car Rental', 'food', 'wine' ],
'zflag':['1', '1', '1', ' ',' ',' ',' ','2','2',' ',' ' ]
}
df = pd.DataFrame(data=d)
temp_df = df.groupby(["emp", "date"], axis=0)["expense type"].apply(lambda x: 1 if "Car Mileage" in x.values and any([k in x.values for k in ["Car Rental", "Car Rental - Gas"]]) else 0).rename("zzflag")
temp_df = temp_df.loc[temp_df!=0,:].cumsum()
a = temp_df.reset_index()
all(pd.merge(df, a) == pd.merge(df, a, on=['emp','date']))
输出:
True
关于python - 合并在 pandas 中是如何工作的,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50027208/