python - 在 scikit-learn 中实现 K 邻居分类器，每个对象具有 3 个特征

我想用 scikit-learn 模块 ( http://scikit-learn.org/dev/modules/generated/sklearn.neighbors.KNeighborsClassifier.html ) 实现一个 KNeighborsClassifier

我从我的图像中检索坚固性、伸长率和 Humoments 特征。我如何准备这些数据以进行培训和验证？我必须为我从图像中检索到的每个对象创建一个包含 3 个特征 [Hm, e, s] 的列表(从 1 个图像中有更多对象)？

我读了这个例子(http://scikit-learn.org/dev/modules/generated/sklearn.neighbors.KNeighborsClassifier.html):

X = [[0], [1], [2], [3]]
y = [0, 0, 1, 1]
from sklearn.neighbors import KNeighborsClassifier
neigh = KNeighborsClassifier(n_neighbors=3)
neigh.fit(X, y) 

print(neigh.predict([[1.1]]))
print(neigh.predict_proba([[0.9]]))

X 和 y 是 2 个特征？

samples = [[0., 0., 0.], [0., .5, 0.], [1., 1., .5]]
from sklearn.neighbors import NearestNeighbors
neigh = NearestNeighbors(n_neighbors=1)
neigh.fit(samples) 

print(neigh.kneighbors([1., 1., 1.]))

为什么在第一个例子中使用 X 和 y 现在采样？

最佳答案

您的第一段代码定义了`1d` 数据的分类器。

X 表示特征向量。

[0] is the feature vector of the first data example
[1] is the feature vector of the second data example
....
[[0],[1],[2],[3]] is a list of all data examples, 
  each example has only 1 feature.

y 代表标签。

下图显示了这个想法:

enter image description here

绿色节点是标签为0的数据
红色节点是标签为1的数据
灰色节点是带有未知标签的数据。

    print(neigh.predict([[1.1]]))

This is asking the classifier to predict a label for x=1.1.

    print(neigh.predict_proba([[0.9]]))

这是要求分类器给出每个标签的成员概率估计。

由于两个灰色节点都更靠近绿色，因此下面的输出是有意义的。

    [0] # green label
    [[ 0.66666667  0.33333333]]  # green label has greater probability

第二段代码其实对`scikit-learn`有很好的说明:

In the following example, we construct a NeighborsClassifier class from an array representing our data set and ask who’s the closest point to [1,1,1]

>>> samples = [[0., 0., 0.], [0., .5, 0.], [1., 1., .5]]
>>> from sklearn.neighbors import NearestNeighbors
>>> neigh = NearestNeighbors(n_neighbors=1)
>>> neigh.fit(samples) 
NearestNeighbors(algorithm='auto', leaf_size=30, ...)
>>> print(neigh.kneighbors([1., 1., 1.])) 
(array([[ 0.5]]), array([[2]]...))

There is no target value here because this is only a NearestNeighbors class, it's not a classifier, hence no labels are needed.

For your own problem:

Since you need a classifier, you should resort to KNeighborsClassifier if you want to use KNN approach. You might want to construct your feature vector X and label y as below:

X = [ [h1, e1, s1], 
      [h2, e2, s2],
      ...
    ]
y = [label1, label2, ..., ]

关于python - 在 scikit-learn 中实现 K 邻居分类器，每个对象具有 3 个特征，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/14505716/

python - 在 scikit-learn 中实现 K 邻居分类器，每个对象具有 3 个特征

您的第一段代码定义了`1d` 数据的分类器。

第二段代码其实对`scikit-learn`有很好的说明:

For your own problem:

上一篇：python - 如何在 Python 中删除 Riak 存储桶？

下一篇：python - 如何获取从 django.template 导入的 RequestContext 字典

python - 在 scikit-learn 中实现 K 邻居分类器，每个对象具有 3 个特征

您的第一段代码定义了1d 数据的分类器。

第二段代码其实对scikit-learn有很好的说明:

For your own problem:

上一篇：python - 如何在 Python 中删除 Riak 存储桶？

下一篇：python - 如何获取从 django.template 导入的 RequestContext 字典

您的第一段代码定义了`1d` 数据的分类器。

第二段代码其实对`scikit-learn`有很好的说明: