pandas - 如何获取两个不同数据框的两个地理坐标之间的距离?

标签 pandas dataframe geopy

我正在为大学做一个项目,我有两个 pandas 数据框:

      # Libraries
      import pandas as pd
      from geopy import distance

      # Dataframes

      df1 = pd.DataFrame({'id': [1,2,3],                   
                          'lat':[-23.48, -22.94, -23.22],
                          'long':[-46.36, -45.40, -45.80]})

       df2 = pd.DataFrame({'id': [100,200,300],                   
                           'lat':[-28.48, -22.94, -23.22],
                           'long':[-46.36, -46.40, -45.80]})

我需要计算数据帧之间的地理纬度和经度坐标之间的距离。所以我用了geopy。如果坐标组合之间的距离小于 100 米的阈值,那么我必须在“附近”列中分配值 1。我编写了以下代码:

      threshold = 100  # meters

      df1['nearby'] = 0

      for i in range(0, len(df1)):
          for j in range(0, len(df2)):

              coord_geo_1 = (df1['lat'].iloc[i], df1['long'].iloc[i])
              coord_geo_2 = (df2['lat'].iloc[j], df2['long'].iloc[j])

              var_distance = (distance.distance(coord_geo_1, coord_geo_2).km) * 1000 

              if(var_distance < threshold):
                   df1['nearby'].iloc[i] = 1

虽然出现警告,但代码可以正常工作。但是,我想找到一种方法来覆盖 for() 迭代。可能吗?

       # Output:

       id   lat       long  nearby
        1   -23.48  -46.36    0
        2   -22.94  -45.40    0
        3   -23.22  -45.80    1

最佳答案

如果可以使用库 scikit-learn,方法 haversine_distances计算两组坐标之间的距离。所以你得到:

from sklearn.metrics.pairwise import haversine_distances

# variable in meter you can change
threshold = 100 # meters

# another parameter
earth_radius = 6371000  # meters

df1['nearby'] = (
    # get the distance between all points of each DF
    haversine_distances(
        # note that you need to convert to radiant with *np.pi/180
        X=df1[['lat','long']].to_numpy()*np.pi/180, 
        Y=df2[['lat','long']].to_numpy()*np.pi/180)
    # get the distance in meter
    *earth_radius
    # compare to your threshold
    < threshold
    # you want to check if any point from df2 is near df1
    ).any(axis=1).astype(int)

print(df1)

#    id    lat   long  nearby
# 0   1 -23.48 -46.36       0
# 1   2 -22.94 -45.40       0
# 2   3 -23.22 -45.80       1

编辑:OP 要求一个与 geopy 有距离的版本,所以这是一种方法。

df1['nearby'] = (np.array(
    [[(distance.distance(coord1, coord2).km)
      for coord2 in df2[['lat','long']].to_numpy()] 
     for coord1 in df1[['lat','long']].to_numpy()]
     ) * 1000 < threshold
).any(1).astype(int)

关于pandas - 如何获取两个不同数据框的两个地理坐标之间的距离?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70941094/

相关文章:

python - Python Pandas 的合并 (SQL) 功能

python - 根据聚合计数对 Pandas DataFrame 的行进行排序并随机获取一行

python - 如何忽略 AttributeError : 'NoneType'

python - Google App Engine 不支持 Geopy?

python - 如何在 Pandas 数据框上应用 scipy 函数

python - 将 Colormap 功能与 Pandas.DataFrame.Plot 结合使用

python - 对多索引 Pandas 数据框中的行求和

python - 增加大小/参数的递归循环(哈密尔顿路径?)Python

python - float 对象不可下标(Python)

R动态地将具有NA的多个列合并为单个列