python - pandas 构建逐行比较矩阵

我有两个数据帧，a (10,2) 和 a (4,2)，我正在寻找一种更快/更Pythonic 的方法来逐行比较它们。

x = pd.DataFrame([range(10),range(2,12)])
x = x.transpose()
y = pd.DataFrame([[5,8],[2,3],[5,5]])

我想构建一个比较矩阵 (10,3)，显示第一个数据帧中的哪些行符合第二个数据帧中的以下要求。 x 1值必须 >= y[0] 值，并且 x[0] 值必须 <= y 1值(value)。实际上，数据是日期，但为了简单起见，我仅使用整数以使此示例更易于理解。我们正在测试时间段的重叠，因此逻辑表明各个表的时间段必定存在一些重叠。

arr = np.zeros((len(x),len(y)), dtype=bool)
for xrow in x.index:
    for yrow in y.index:
        if x.loc[xrow,1] >= y.loc[yrow,0] and x.loc[xrow,0] <= y.loc[yrow,1]:
            arr[xrow,yrow] = True
arr

上面的暴力方法太慢了。关于如何对其进行向量化或进行某种转置矩阵比较有什么建议吗？

最佳答案

您可以将 x、y 转换为 NumPy 数组，然后使用 np.newaxis/None 扩展维度，这将带来 NumPy's broadcasting执行相同操作时。因此，所有这些比较和输出 bool 数组都将以矢量化方式创建。实现看起来像这样 -

X = np.asarray(x) 
Y = np.asarray(y)
arr = (X[:,None,1] >= Y[:,0]) & (X[:,None,0] <= Y[:,1])

示例运行 -

In [207]: x = pd.DataFrame([range(10),range(2,12)])
     ...: x = x.transpose()
     ...: y = pd.DataFrame([[5,8],[2,3],[5,5]])
     ...: 

In [208]: X = np.asarray(x) 
     ...: Y = np.asarray(y)
     ...: arr = (X[:,None,1] >= Y[:,0]) & (X[:,None,0] <= Y[:,1])
     ...: 

In [209]: arr
Out[209]: 
array([[False,  True, False],
       [False,  True, False],
       [False,  True, False],
       [ True,  True,  True],
       [ True, False,  True],
       [ True, False,  True],
       [ True, False, False],
       [ True, False, False],
       [ True, False, False],
       [False, False, False]], dtype=bool)

关于python - pandas 构建逐行比较矩阵，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/33269777/

python - pandas 构建逐行比较矩阵

上一篇：python - 是什么导致 'int' 对象没有属性 'has_key' ？

下一篇：python - 添加与 Tornado TCPClient 的连接超时