我有带有浮点值(纬度/经度)的数据框(见图),我想删除精度为 0.02 的最接近的值。例如:
[0.03, 0.05, 0.04, 0.06] -> [0.04]
如何使用 pandas 方法做到这一点?
最佳答案
尽管查询者不愿意提供更多细节,但问题很有趣。我认为给定的坐标点将被合并为纬度和经度值分别不超过精度两倍的组,i。 e.在一个类似正方形的簇内,并且每个簇都集中到靠近中心的一个点。这个问题可以通过对点进行排序(例如借助 scikit-learn OPTICS 实现)、将它们分成满足聚类条件的组并应用近中心点的选择来解决。
import pandas as pd
df = pd.DataFrame({'lon': (20.489192, 20.47559, 20.481381, 20.4422, 20.474462),
'lat': (54.719898, 54.720311, 54.731917, 54.710419, 54.72706 )},
index=[3, 4, 20, 21, 24])
def group(x, minmax_group): # this function clusters points within +/- 0.02
if not hasattr(x, "__len__"): x = (x, ) # if it has to work for the one-dimensional case
# in two-dimensional case, x is coordinate pair (longitude, latitude)
# minmax_group[0][min] is the minimum coordinate pair (lower left) of a cluster
# minmax_group[0][max] is the maximum coordinate pair (upper right) of a cluster
# minmax_group[1] is the sequential index of the cluster
if minmax_group[0] is None: minmax_group[:] = {min:x, max:x}, 0 # first cluster
# check if longitude or latitude outside of current cluster
elif any(x[l] < minmax_group[0][max][l]-.04
or minmax_group[0][min][l]+.04 < x[l] for l in range(len(x))):
minmax_group[0] = {min:x, max:x}
minmax_group[1] += 1 # new cluster
else:
for m in minmax_group[0]: # store current minimum/maximum coordinates
minmax_group[0][m] = tuple(m(minmax_group[0][m][l], x[l]) for l in range(len(x)))
return minmax_group[1]
from sklearn.cluster import OPTICS
opt = OPTICS().fit(df) # order the points
# group the points; set index because only index is passed to groupby function
dt = df.reset_index().set_index(['lon', 'lat']).iloc[opt.ordering_].groupby(
lambda x, minmax_group=[None]: group(x, minmax_group)).apply(
# choose point at the center of the group; set index back to original
lambda g: g.reset_index().iloc[[(len(g)-1)//2]]).set_index('index').rename_axis(None)
print(dt)
此示例的输出:
lon lat
4 20.47559 54.720311
21 20.44220 54.710419
关于python - 删除期末值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60276842/