python - 从列表中删除相似的项目

从元组列表开始，其中包含图形上点的 x 和 y 坐标；我想删除列表中的重复点；然而，出于我的目的，距离 10 以内的点我认为是重复的。

我已经编写了一个似乎可以完成这项工作的函数，但我敢打赌有更好的方法。在下面的示例数据中:点 1、2 和 5 是重复的(彼此的距离在 10 以内)。我不在乎这三点中哪一点能在淘汰过程中幸存下来。我预计处理的点不会超过 100 个，其中大约 50% 会被淘汰。谢谢!

def is_close(pointA, pointB, closeness):
    x1, y1  = pointA
    x2, y2 = pointB
    distance = int(((x2-x1)**2 + (y2-y1)**2)**0.5) # distance formula
    if distance < closeness:
        return True
    return False

def remove_close_duplicated(data, closeness):
    if len(data) < 2: # can't have duplicates if there aren't at least 2 points
        return data
    new_list_points = []
    for i, point in enumerate(data):
        if i == 0:
            new_list_points.append(point)
            continue
        close = False
        for new_point in new_list_points:
            if is_close(new_point, point, closeness):
                close = True
                break 
        if close == False:
            new_list_points.append(point)
    return new_list_points

sample_data =[
    (600, 400), # 1
    (601, 401), # 2
    (725, 300), # 3
    (800, 900), # 4
    (601, 400), # 5
]

closeness = 10                  
print(remove_close_duplicated(sample_data, closeness))
'''
output is:
[(600, 400), (725, 300), (800, 900)]
'''

最佳答案

这有两个部分:查找紧密对和查找分离良好的集合(近邻关系的传递闭包的等价类，或近邻图的连通分量)。

只有 100 个点，您可以通过蛮力完成第一部分，但有效的选择包括将一侧分组为 10 个容器，这样一个点的所有近邻都必须位于其容器或相邻容器中，或者将点存储在k-d树中。

对于第二部分，一个标准解决方案是构建一个不相交集森林，在每个相邻对之间应用并集运算(任意选择一个点存储在(新)根中)。与末尾的根相关联的点是所需的约简集。

关于python - 从列表中删除相似的项目，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52864544/

python - 从列表中删除相似的项目

上一篇：python - 根据另一个数据帧的组范围解释数据帧列的范围

下一篇：EC 返回元素时 python selenium WebDriverWait 不起作用