python - 使用 scikit-learn 的 SVM 分类算法(RBF 内核)时出现意外结果

使用本页上的示例 http://scikit-learn.org/stable/auto_examples/svm/plot_iris.html , 我使用一些标准差为 10 的正态分布数据而不是虹膜数据创建了自己的图表。

我的图表是这样的: enter image description here

请注意 RBF 核图与示例中的核图有很大不同。除了红色和蓝色部分之外，整个区域都被分类为黄色。换句话说，支持向量太多了。我尝试过更改 C 和学位，但没有帮助。我用来生成该图的代码如下所示。

请注意，我需要使用 RBF 内核，因为多项式内核的运行速度比 RBF 内核慢得多。

import numpy as np
import pylab as pl
from sklearn import svm, datasets

FP_SIZE = 50
STD = 10

def gen(fp):

  data = []
  target = []

  fp_count = len(fp)

  # generate rssi reading for monitors / fingerprint points
  # using scikit-learn data structure
  for i in range(0, fp_count):
    for j in range(0,FP_SIZE):
      target.append(i)
      data.append(np.around(np.random.normal(fp[i],STD)))

  data = np.array(data)
  target = np.array(target)

  return data, target

fp = [[-30,-70],[-58,-30],[-60,-60]]

data, target = gen(fp)

# import some data to play with
# iris = datasets.load_iris()
X = data[:, :2]  # we only take the first two features. We could
                      # avoid this ugly slicing by using a two-dim dataset
Y = target

h = .02  # step size in the mesh

# we create an instance of SVM and fit out data. We do not scale our
# data since we want to plot the support vectors
C = 1.0  # SVM regularization parameter
svc = svm.SVC(kernel='linear', C=C).fit(X, Y)
rbf_svc = svm.SVC(kernel='rbf', gamma=0.7, C=C).fit(X, Y)
poly_svc = svm.SVC(kernel='poly', degree=3, C=C).fit(X, Y)
lin_svc = svm.LinearSVC(C=C).fit(X, Y)

# create a mesh to plot in
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                     np.arange(y_min, y_max, h))

# title for the plots
titles = ['SVC with linear kernel',
          'SVC with RBF kernel',
          'SVC with polynomial (degree 3) kernel',
          'LinearSVC (linear kernel)']


for i, clf in enumerate((svc, rbf_svc, poly_svc, lin_svc)):
    # Plot the decision boundary. For that, we will asign a color to each
    # point in the mesh [x_min, m_max]x[y_min, y_max].
    pl.subplot(2, 2, i + 1)
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])

    # Put the result into a color plot
    Z = Z.reshape(xx.shape)
    pl.contourf(xx, yy, Z, cmap=pl.cm.Paired)
    pl.axis('off')

    # Plot also the training points
    pl.scatter(X[:, 0], X[:, 1], c=Y, cmap=pl.cm.Paired)

    pl.title(titles[i])

pl.show()

最佳答案

除了您所得到的要点之外，您是否使用了任何其他正确性衡量标准。

通常 SVM 需要使用网格搜索来运行，特别是如果您有 RBF，C 只会负责正则化，如果您的数据一开始就不稀疏，那么这将起不到什么作用。

您需要对 gamma 和 C 运行网格搜索，他们在这里有一个非常好的示例:

http://scikit-learn.org/0.13/auto_examples/grid_search_digits.html#example-grid-search-digits-py

此外，他们的库已经负责交叉验证。

请记住，这些示例对于玩具数据集很有用，当您输入新数据集时，没有理由相信其行为会与示例中的数据集类似。

关于python - 使用 scikit-learn 的 SVM 分类算法(RBF 内核)时出现意外结果，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/17794313/

python - 使用 scikit-learn 的 SVM 分类算法(RBF 内核)时出现意外结果

上一篇：machine-learning - NLP:计算文档属于某个主题(带有词袋)的概率？

下一篇：machine-learning - 在计算机视觉中使用形状描述符检测对象类别