python - 使用 matplotlib 绘制两种不同的 SVM 方法有问题吗?

标签 python python-2.7 numpy matplotlib scikit-learn

我想像这个 documentation example 一样搞砸分类算法的两种不同方法。这是我尝试过的:

from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np
tfidf_vect= TfidfVectorizer(use_idf=True, smooth_idf=True, sublinear_tf=False, ngram_range=(2,2))

import pandas as pd
df = pd.read_csv('/data.csv',
                     header=0, sep=',', names=['SentenceId', 'Sentence', 'Sentiment'])



X = tfidf_vect.fit_transform(df['Sentence'].values)
y = df['Sentiment'].values


from sklearn import cross_validation
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X,
                                                    y, test_size=0.33)
from sklearn.svm import SVC
#first svm
clf = SVC(kernel='linear')
clf.fit(reduced_data, y)
prediction = clf.predict(X_test)
w = clf.coef_[0]
a = -w[0] / w[1]
xx = np.linspace(-10, 10)
yy = a * xx - clf.intercept_[0] / w[1]


# get the separating hyperplane using weighted classes

#second svm
wclf = SVC(kernel='linear', class_weight={5: 10},C=1000)
wclf.fit(reduced_data, y)
weighted_prediction = wclf.predict(X_test)



#PCA
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
pca.fit(X)


ww = wclf.coef_[0]
wa = -ww[0] / ww[1]
wyy = wa * xx - wclf.intercept_[0] / ww[1]

# plot separating hyperplanes and samples
import matplotlib.pyplot as plt
h0 = plt.plot(xx, yy, 'k-', label='no weights')
h1 = plt.plot(xx, wyy, 'k--', label='with weights')
plt.scatter(reduced_data[:, 0], reduced_data[:, 1], c=y, cmap=plt.cm.Paired)
plt.legend()

plt.axis('tight')
plt.show()

但我得到以下异常:

Traceback (most recent call last):
  File "file.py", line 25, in <module>
    a = -w[0] / w[1]
  File "/usr/local/lib/python2.7/site-packages/scipy/sparse/csr.py", line 253, in __getitem__
    return self._get_row_slice(row, col)
  File "/usr/local/lib/python2.7/site-packages/scipy/sparse/csr.py", line 320, in _get_row_slice
    raise IndexError('index (%d) out of range' % i)
IndexError: index (1) out of range

如何使用 matplotlib 在 2-D 或 3-D 中正确绘制此任务?我也试过这个,但显然这是错误的:

enter image description here

提前致谢,这是我用来执行此操作的 data

当我打印 w 时,会发生以下情况:

     (0, 911)   -0.352103548716
    a = -w[0] / w[1]
  (0, 2346) -1.20396753467
  File "/usr/local/lib/python2.7/site-packages/scipy/sparse/csr.py", line 253, in __getitem__
  (0, 2482) -0.352103548716
  (0, 2288) -0.733605938797
  (0, 1175) -0.868966214318
  (0, 1936) -0.500071158622
  (0, 2558) -0.40965370142
  (0, 788)  -0.485330735934
  (0, 322)  -0.575610464517
  (0, 453)  -0.584854414882
  (0, 1913) -0.300076915818
  (0, 2411) -0.419065159403
  (0, 2017) -0.407926583824
  (0, 2363) -0.407926583824
  (0, 815)  -1.09245625795
  (0, 543)  -0.248207856236
  (0, 1082) -0.366433457602
  (0, 1312) -0.286768829333
  (0, 1525) -0.286768829333
  (0, 1677) -0.286768829333
  (0, 2679) -0.688619491265
  (0, 413)  -0.101096807406
  (0, 1322) -0.13561265293
  (0, 1488) -0.120403497624
  (0, 1901) -0.337806267742
  : :
  (0, 1609) 0.100116485705
  (0, 581)  0.276579777388
  (0, 2205) 0.241642287418
  (0, 1055) 0.0166785719624
  (0, 2390) 0.349485515339
  (0, 1866) 0.357035248059
  (0, 2098) 0.296454010725
  (0, 2391) 0.45905660273
  (0, 2601) 0.357035248059
  (0, 619)  0.350880030278
  (0, 129)  0.287439419266
  (0, 280)  0.432180530894
  (0, 1747) -0.172314049543
  (0, 1211) 0.573579514463
  (0, 86)   0.3152907757
  (0, 452)  0.305881204557
  (0, 513)  0.212678772368
  (0, 946)  -0.347372778859
  (0, 1194) 0.298193025133
  (0, 2039) 0.34451957335
  (0, 2483) 0.245366213834
  (0, 317)  0.355996551812
  (0, 977)  0.355996551812
  (0, 1151) 0.284383826645
  (0, 2110) 0.120512273328

它返回了一个非常大的稀疏矩阵。

最佳答案

w = clf.coef_[0]
a = -w[0] / w[1]

您的“w”列表似乎只包含一个值。这就是您在尝试访问第二个索引 w[1] 时收到错误的原因。

关于python - 使用 matplotlib 绘制两种不同的 SVM 方法有问题吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28531573/

相关文章:

python - 如何使用 nosetest (Python 2.7) 测试 while 循环(一次)

python - 无法调用按钮命令 : application has been destroyed

python - Pandas 有条件地创建数据框列 : based on multiple conditions

python - 在 Python/Numpy 中优化许多矩阵运算

python - 在 numpy 数组中一次访问 block

python - 使用单列转置的 Pandas 将数据集矩阵打印到表中

python - 仅 1 个标量的 Tensorboard 摘要标量错误

python - 如何将pygame Surface作为图像保存到内存(而不是磁盘)

python - 接受上传或外部链接作为源的 Django ImageField 小部件

Python翻译器,如何只替换一个单词