我正在尝试使用此代码提取第二个和第三个K最近邻居。当它们存在时,我能够得到它们。当它们不存在时,我会收到类似以下错误:IndexError:索引3超出轴3的大小3。
import numpy as np
from sklearn.neighbors import NearestNeighbors
import pandas as pd
def nn(x):
nbrs = NearestNeighbors(
n_neighbors=3,
algorithm='auto',
metric='euclidean'
).fit(x)
distances, indices = nbrs.kneighbors(x)
return distances, indices
df = pd.DataFrame({'time': updated_df['upd_time_code'], 'x': updated_df['x'], 'y': updated_df['y'], 'id': updated_df['id']})
#This has the index of the nearest neighbor in the group, as well as the distance
nns = df.drop('id', 1).groupby('time').apply(lambda x: nn(x.as_matrix()))
groups = df.groupby('time')
nn_rows = []
for i, nn_set in enumerate(nns):
group = groups.get_group(i)
print("processing group at: ", group.time)
for j, tup in enumerate(zip(nn_set[0], nn_set[1])):
nn_rows.append({'time': i,
'id': group.iloc[j]['id'],
'nearest_neighbour1': group.iloc[tup[1][1]]['id'],
'nearest_neighbour2': group.iloc[tup[1][2]]['id'],
'nearest_neighbour3': group.iloc[tup[1][3]]['id']
'euclidean_distance1': tup[0][1],
'euclidean_distance2': tup[0][2],
'euclidean_distance3': tup[0][2]})
nn_df = pd.DataFrame(nn_rows).set_index('time')
nn_df
我如何处理有时没有邻居而有时却没有邻居的问题,可以通过调整此代码来忽略它?
最佳答案
您正在访问第四个最近的邻居。
这是您代码中需要修复的经典数组索引错误。
关于python - 提取第二和第三邻居时的代码,当第二和第三邻居不存在时将其忽略,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59432203/