我想像这个 documentation example 一样搞砸分类算法的两种不同方法。这是我尝试过的:
from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np
tfidf_vect= TfidfVectorizer(use_idf=True, smooth_idf=True, sublinear_tf=False, ngram_range=(2,2))
import pandas as pd
df = pd.read_csv('/data.csv',
header=0, sep=',', names=['SentenceId', 'Sentence', 'Sentiment'])
X = tfidf_vect.fit_transform(df['Sentence'].values)
y = df['Sentiment'].values
from sklearn import cross_validation
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X,
y, test_size=0.33)
from sklearn.svm import SVC
#first svm
clf = SVC(kernel='linear')
clf.fit(reduced_data, y)
prediction = clf.predict(X_test)
w = clf.coef_[0]
a = -w[0] / w[1]
xx = np.linspace(-10, 10)
yy = a * xx - clf.intercept_[0] / w[1]
# get the separating hyperplane using weighted classes
#second svm
wclf = SVC(kernel='linear', class_weight={5: 10},C=1000)
wclf.fit(reduced_data, y)
weighted_prediction = wclf.predict(X_test)
#PCA
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
pca.fit(X)
ww = wclf.coef_[0]
wa = -ww[0] / ww[1]
wyy = wa * xx - wclf.intercept_[0] / ww[1]
# plot separating hyperplanes and samples
import matplotlib.pyplot as plt
h0 = plt.plot(xx, yy, 'k-', label='no weights')
h1 = plt.plot(xx, wyy, 'k--', label='with weights')
plt.scatter(reduced_data[:, 0], reduced_data[:, 1], c=y, cmap=plt.cm.Paired)
plt.legend()
plt.axis('tight')
plt.show()
但我得到以下异常:
Traceback (most recent call last):
File "file.py", line 25, in <module>
a = -w[0] / w[1]
File "/usr/local/lib/python2.7/site-packages/scipy/sparse/csr.py", line 253, in __getitem__
return self._get_row_slice(row, col)
File "/usr/local/lib/python2.7/site-packages/scipy/sparse/csr.py", line 320, in _get_row_slice
raise IndexError('index (%d) out of range' % i)
IndexError: index (1) out of range
如何使用 matplotlib 在 2-D 或 3-D 中正确绘制此任务?我也试过这个,但显然这是错误的:
提前致谢,这是我用来执行此操作的 data。
当我打印 w
时,会发生以下情况:
(0, 911) -0.352103548716
a = -w[0] / w[1]
(0, 2346) -1.20396753467
File "/usr/local/lib/python2.7/site-packages/scipy/sparse/csr.py", line 253, in __getitem__
(0, 2482) -0.352103548716
(0, 2288) -0.733605938797
(0, 1175) -0.868966214318
(0, 1936) -0.500071158622
(0, 2558) -0.40965370142
(0, 788) -0.485330735934
(0, 322) -0.575610464517
(0, 453) -0.584854414882
(0, 1913) -0.300076915818
(0, 2411) -0.419065159403
(0, 2017) -0.407926583824
(0, 2363) -0.407926583824
(0, 815) -1.09245625795
(0, 543) -0.248207856236
(0, 1082) -0.366433457602
(0, 1312) -0.286768829333
(0, 1525) -0.286768829333
(0, 1677) -0.286768829333
(0, 2679) -0.688619491265
(0, 413) -0.101096807406
(0, 1322) -0.13561265293
(0, 1488) -0.120403497624
(0, 1901) -0.337806267742
: :
(0, 1609) 0.100116485705
(0, 581) 0.276579777388
(0, 2205) 0.241642287418
(0, 1055) 0.0166785719624
(0, 2390) 0.349485515339
(0, 1866) 0.357035248059
(0, 2098) 0.296454010725
(0, 2391) 0.45905660273
(0, 2601) 0.357035248059
(0, 619) 0.350880030278
(0, 129) 0.287439419266
(0, 280) 0.432180530894
(0, 1747) -0.172314049543
(0, 1211) 0.573579514463
(0, 86) 0.3152907757
(0, 452) 0.305881204557
(0, 513) 0.212678772368
(0, 946) -0.347372778859
(0, 1194) 0.298193025133
(0, 2039) 0.34451957335
(0, 2483) 0.245366213834
(0, 317) 0.355996551812
(0, 977) 0.355996551812
(0, 1151) 0.284383826645
(0, 2110) 0.120512273328
它返回了一个非常大的稀疏矩阵。
最佳答案
w = clf.coef_[0]
a = -w[0] / w[1]
您的“w”列表似乎只包含一个值。这就是您在尝试访问第二个索引 w[1] 时收到错误的原因。
关于python - 使用 matplotlib 绘制两种不同的 SVM 方法有问题吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28531573/