python - 如何根据错误值找到两个 pandas 系列之间的交集

我有两个 pandas 数据框:

df1 = pd.DataFrame({'col1': [1.2574, 5.3221, 4.3215, 9.8841], 'col2': ['a', 'b', 'c', 'd']})
df2 = pd.DataFrame({'col1': [4.326, 9.89, 5.326, 1.2654], 'col2': ['w', 'x', 'y', 'z']})

现在我想比较两个数据帧的 col1 中的值。考虑 df1 中的 5.3221，我想检查 df2['col1'] 中是否存在该值，错误为 0.005 (在这个例子中，来自 df2['col1'] 的 5.326 应被视为等于 5.3221)并创建第三个数据帧保存 df1 和 df2 中的两列，其中上述条件为真。

预期输出是:

    col1    col2    col1.1  col2.2
0   5.3221  b       5.236   y
1   4.3215  c       4.326   w

我定义了一个能够处理错误情况的函数:

def close(a, b, e=0.005):
    return round(abs(a - b), 3) <= e

但我不知道如何在不使用 for 循环的情况下将其应用于数据。我也知道我可以使用 numpy.intersect1d 但我不知道如何使用。

如有任何帮助，我们将不胜感激:)

编辑:建议的重复答案无法解决我的问题。这个问题只是基于相似的索引组合两个数据帧。此外，difflib 用于查找单词匹配，而不是整数。我的情况完全不同。

最佳答案

我已经添加了代码，其中的单词

首先计算每个点之间的距离为交叉，然后进行过滤。获取这些行并合并

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'col1': [1.2574, 5.3221, 4.3215, 9.8841], 'col2': ['a', 'b', 'c', 'd']})
df2 = pd.DataFrame({'col1': [4.326, 9.89, 5.326, 1.2654], 'col2': ['w', 'x', 'y', 'z']})

# Get the target columns
c11 = df1['col1'].to_numpy()
c21 = df2['col1'].to_numpy()

# calculate cross errors by broadcast and filter columns
# these will be indices of rows to be inserted in new df
c = np.argwhere(np.abs(c11[:, np.newaxis] - c21) < 0.005)


x = pd.DataFrame()
# Insert by removing index otherwise it will try to match the indexs are change row orders
x[['col1', 'col2']] = df1.iloc[c[:, 0]][['col1', 'col2']].reset_index(drop=True)
x[['col1.1', 'col2.2']] = df2.iloc[c[:, 1]][['col1', 'col2']].reset_index(drop=True)

print(x)

关于python - 如何根据错误值找到两个 pandas 系列之间的交集，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/68559606/

python - 如何根据错误值找到两个 pandas 系列之间的交集

上一篇：sql - 过滤和排序 SQL 查询以重新创建嵌套结构

下一篇：python - PyArrow:如何使用新的文件系统接口(interface)将文件从本地复制到远程？