df1 = pd.DataFrame({"fields": [["boy", "apple", "toy", "orange", "bear", "eat"],
["orange", "girl", "red"]]})
df2 = pd.DataFrame({"other fields": [["boy", "girl", "orange"]})
我想在 df1 中添加一列,表明这些字段与其他字段重叠, 样本输出:
|fields| overlap?|
|------|---------|
|boy |Y
|apple |N
|toy |N
|orange|Y
|bear |N
|eat |N
|orange|Y
|girl |Y
|red |N
首先我将分解 df1 上的字段,但我不确定接下来要检查哪些数据帧之间的重叠值。谢谢!
最佳答案
你可以使用 isin
来查找两个 df
的重叠值,并将 bool
更改为 Y
/N
使用 np.where
df1 = pd.DataFrame({"fields": [["boy", "apple", "toy", "orange", "bear", "eat"], ["orange", "girl", "red"]]})
df2 = pd.DataFrame({"other fields": [["boy", "girl", "orange"]]})
df1 = df1.explode('fields', ignore_index=True)
df1['overlap'] = np.where(df1['fields'].isin(df2['other fields'].explode()), 'Y', 'N')
print(df1)
输出
fields overlap
0 boy Y
1 apple N
2 toy N
3 orange Y
4 bear N
5 eat N
6 orange Y
7 girl Y
8 red N
关于python - 检测 2 个数据帧中的重叠值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73729379/