python - 如何优化Python代码来计算两个GPS点之间的距离

我正在寻找一种更快的方法来优化我的 python 代码，以计算两个 GPS 点之间的距离、经度和纬度。这是我的代码，我想对其进行优化以使其工作得更快。

 def CalcDistanceKM(lat1, lon1, lat2, lon2):
        lat1, lon1, lat2, lon2 = map(radians, [lat1, lon1, lat2, lon2])
        dlon = lon2 - lon1
        dlat = lat2 - lat1
        a = sin(dlat / 2) ** 2 + cos(lat1) * cos(lat2) * sin(dlon / 2) ** 2
        c = 2 * atan2(sqrt(a), sqrt(1 - a))
        distance = 6371 * c

        return distance

此代码的行为是计算两个不同 Excel(CSV 文件)中的两个纬度和经度之间的距离，并返回它们之间的距离。

解释行为的更多代码:

for i in range(File1):
            for j in range(File2):
                if File1['AA'][i] == File2['BB'][j]:
                            distance = CalcDistanceKM(File2['LATITUDE'][j], File2['LONGITUDE'][j],
                                                      File1['Latitude'][i],File1['Longitude'][I])
                        File3 = File3.append({'DistanceBetweenTwoPoints' : (distance) })

谢谢。

最佳答案

将你的点准备到 numpy 数组中，然后使用准备好的数组调用一次半正矢函数，以利用 C 性能和矢量化优化 - 两者都是来自出色的 numpy 库的免费赠品:


def haversine(x1: np.ndarray,
              x2: np.ndarray,
              y1: np.ndarray,
              y2: np.ndarray
              ) -> np.ndarray:
    """
    input in degrees, arrays or numbers.
    
    compute haversine distance between coords (x1, y1) and (x2, y2)
    Parameters
    ----------
    x1 : np.ndarray
        X/longitude in degrees for coords pair 1
    x2 : np.ndarray
        Y/latitude in degrees for coords pair 1.
    y1 : np.ndarray
        X/longitude in degrees for coords pair 2.
    y2 : np.ndarray
        Y/latitude in degrees for coords pair 2.
    Returns
    -------
    np.ndarray or float
        haversine distance (meters) between the two given points. 
    """
    x1 = np.deg2rad(x1)
    x2 = np.deg2rad(x2)
    y1 = np.deg2rad(y1)
    y2 = np.deg2rad(y2)
    return 12730000*np.arcsin(((np.sin((y2-y1)*0.5)**2) + np.cos(y1)*np.cos(y2)*np.sin((x2-x1)*0.5)**2)**0.5)

我在文件 1 和文件 2 中看到您正在重复迭代两者，您是否在那里搜索匹配项？ for 循环非常慢，因此这将是一个很大的瓶颈，但如果没有关于正在使用的 csv 以及 file1 中的记录如何与 file2 匹配的更多信息，我无能为力。也许将两个文件中的前几条记录添加到问题中以提供一些上下文？

更新: 感谢您提供 colab 链接。

您从两个数据帧drive_test和Cells开始。您的“if”条件之一:

if drive_test['Serving Cell Identity'][i] == Cells['CI'][j] \
  or drive_test['Serving Cell Identity'][i] == Cells['PCIG'][j] \
  and drive_test['E_ARFCN'][i] == Cells['EARFCN_DL'][j]:
# btw this is ambiguous, use bracket, python reads this as (a or b) and c but that may not be the intention.

基于这种交叉合并的方法，可以写成pandas合并和过滤器Create combination of two pandas dataframes in two dimensions

new_df = drive_test.assign(merge_key = 1).merge(Cells.assign(merge_key = 1), on = 'merge_key', suffixes = ("", "")).drop('merge_key', axis = 1)
# will need to use suffixes if your dataframes have common column names

cond1_df = new_df[((new_df['Serving Cell Identity'] == new_df.CI) | (new_df['Serving Cell Identity'] == new_df.PCIG)) & (new_df.E_ARFCN == new_df.EARFCN_DL)]
cond1_df = cond1_df.assign(distance_between = haversine(cond1_df.Longitude.to_numpy(), cond1_df.LONGITUDE.to_numpy(), cond1_df.Latitude.to_numpy(), cond1_df.LATITUDE.to_numpy()))
# note that my haversine input args are differently ordered to yours

然后您应该获得第一个条件的所有结果，并且可以对其余条件重复此操作。我无法在您的 csv 上测试这一点，因此可能需要一些调试，但这个想法应该没问题。

请注意，根据您的 csv 有多大，这可能会爆炸成一个非常大的数据帧并最大化您的 RAM，在这种情况下，您几乎只能逐一迭代它，除非您想制作一个分段方法，您迭代一个数据帧中的列，并根据另一个数据帧中的条件匹配所有列。这仍然比一次迭代两个更快，但可能比一次全部迭代慢。

更新 - 尝试第二个想法，因为新的数据帧似乎使内核崩溃

在循环中，您可以对第一个条件执行类似的操作(对于所有接下来的匹配条件也类似)

for i in range(drive_test_size):
  matching_records = Cells[((Cells.CI == drive_test['Serving Cell Identity'][i]) | (Cells.PCIG == drive_test['Serving Cell Identity'][i])) & (Cells.EARFCN_DL == drive_test['E_ARFCN'][i])]
  if len(matching_records) > 0:
    matching_records = matching_records.assign(distance_between = haversine(matching_records.Longitude.to_numpy(), matching_records.LONGITUDE.to_numpy(), matching_records.Latitude.to_numpy(), matching_records.LATITUDE.to_numpy()))

无论如何，这应该相当快，因为您将只使用 1 个 python“for”循环，然后让超快的 numpy/pandas 查询执行下一个。该模板也应该适用于您的其余条件。

关于python - 如何优化Python代码来计算两个GPS点之间的距离，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/71862366/

python - 如何优化Python代码来计算两个GPS点之间的距离

上一篇：r - 如何在R中将每日数据分成每周或每月数据

下一篇：android-jetpack-compose - 当包含对象内部发生更改时，Jetpack Compose LazyColum 状态不会更改