python - 如何获得数据框中低于特定阈值的最小值？

标签 python pandas dataframe indexing euclidean-distance

我在 pandas 中有 2 个数据帧，其中包含汽车和树木的位置信息。

df1

                 x       y   
         car
          3     216     13    
          4     218     12    
          5     217     12

df2

                 x       y    
          tree 
          5     253     180    
          6     241     24    
          8     217     14

我将如何计算每辆车和每棵树之间的欧几里德距离，然后过滤掉小于例如:5的距离？我想创建另一个数据框，其中包含汽车和树木的编号，以及两者之间的距离(见下文)

df3

         car   tree    dist     
          5     8      2.2

到目前为止我可以使用

 distance = scipy.spatial.distance.cdist(df1, df2, metric='euclidean')

要获得所有内容的欧几里德距离，但我正在努力选择我需要的值(即距离< 5)。感谢帮助，谢谢!!

最佳答案

distance = spatial.distance.cdist(df1, df2, metric='euclidean')
idx = np.where(distance < 5)
pd.DataFrame({"car":df1.iloc[idx[0]].index.values, 
              "tree":df2.iloc[idx[1]].index.values,
              "dist": distance[idx]})

    car dist        tree
0   3   1.414214    8
1   4   2.236068    8
2   5   2.000000    8

cdist 的 (i, j) 条目是第一组项目中的第 i 个项目与第二组项目中的第 j 个项目之间的距离。
我们使用np.where识别 distance 中的 (i, j) 对满足条件distance < 5 。
我们使用上一步获得的索引构建了一个新的数据框。 idx[0]给出 df1 中的部分我们需要检索和 idx[1]给出 df2 中的部分我们需要得到的。

关于python - 如何获得数据框中低于特定阈值的最小值？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48671317/

上一篇：python - 为什么从对象或 str 转换为类别时 dtype 不同？

下一篇：python - 如何使用 exchangelib 获取非收件箱文件夹的邮件

相关文章：

python - 如何提取/拆分数据框中的列表列以分隔唯一的列？

python - 如何使用重复索引对数据框列中的值进行求和

python - 简单的Python Q : idk what produces None when this code's executed

python - 在 Python 中将二进制文件转换为 ascii

python - 如何合并这两列？ Pandas

具有不一致数据点的时间序列数据的 Python 移动平均线

python - Pandas 枢轴与重复项

python - 将 pandas 中的字符串值替换为它们的计数

python - 您可以在 google appengine 之外使用 GQL/Google DataStore 吗？

python - 如何用全息 View 可视化时间跨度？