python - KNeighborsClassifier 中如何使用参数 "weights"？

标签 python machine-learning scikit-learn knn

在sklearn文档中，函数KNeighborsClassifier的参数weights="distance"解释如下:

‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.

虽然对我来说，对相邻点进行加权，然后将预测计算为加权点的平均值是有意义的，例如使用KNeighborsRegressor...但是，我看不到权重在分类中的使用方式算法。根据《The Elements of Statistical Learning》一书，KNN 分类是基于多数投票的。不是吗？

最佳答案

在分类过程中，计算邻居众数时将使用权重(而不是频率，而是使用权重之和来计算众数)。

了解更多详情请查看here ，以供实际执行。

来自 documentation 的示例:

>>> from sklearn.utils.extmath import weighted_mode
>>> x = [4, 1, 4, 2, 4, 2]
>>> weights = [1, 1, 1, 1, 1, 1]
>>> weighted_mode(x, weights)
(array([4.]), array([3.]))
The value 4 appears three times: with uniform weights, the result is simply the mode of the distribution.

>>>
>>> weights = [1, 3, 0.5, 1.5, 1, 2]  # deweight the 4's
>>> weighted_mode(x, weights)
(array([2.]), array([3.5]))

可以查看实现here

关于python - KNeighborsClassifier 中如何使用参数 "weights"？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56799923/

上一篇：machine-learning - 关于卷积神经网络的一致性

下一篇：r - 如何在 R 中正确绘制 ICE？

相关文章：

python - Gensim.Similarity 添加文档或实时培训

r - 显示 R 中函数的源代码

python - 机器学习中测试集需要数据清理吗？

python - Sklearn MLP 分类器超参数优化 (RandomizedSearchCV)

Python C API，新对象的高引用计数

python - sklearn中BaggingClassifier默认配置与硬投票的区别

python - 基于组的 Pandas 和 fillna

python - 如何获取语料库中某个单词的平均 TF-IDF 值？

python - Keras 预测在二元问题中只返回一个类

python - Sklearn 单变量选择 : Features are Constant