我正在尝试实现一种学习算法来预测图像的目标值是 1
还是 0
。首先,我的目标值是这样设置的……
real = [1] * len(images)
fake = [0] * len(fake_images)
total_target = real + fake
total_target = numpy.array(total_target)
>>> [1 1 1 ... 0 0 0 0]
接下来,我将图像列表转换为 numpy
数组的 numpy
数组。所以我将每个图像存储为 numpy
数组...
training_set = []
for image in total_images:
im = image.convert("L")
dataset = numpy.asarray(im)
training_set.append(dataset)
training_set = numpy.array(training_set)
因此 training_set
包含图像。 training_set
的顺序对应于total_target
的顺序,所以training_set
中的第一个图像对应于total_target中的第一个值
在上面的示例中为 1
。
接下来我展平训练集...
n_samples = len(training_set)
data = training_set.reshape((n_samples, -1))
现在我把它传递给下面...
classifier = svm.SVC(gamma=0.001)
classifier.fit(data[:n_samples-1], total_target[:n_samples-1])
我没有包括最后一张图片及其各自的值,因为这是我想要预测的值...
expected = total_target[-1]
predicted = classifier.predict(data[-1])
当我运行所有这些时,出现以下错误...
DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample. DeprecationWarning)
好的,根据错误,我的 total_target
格式错误,所以我添加以下内容...
total_target = numpy.array(total_target).reshape(-1, 1)
我运行它,现在我得到以下错误
DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). y_ = column_or_1d(y, warn=True)
C:\Users\Eric\Anaconda2\lib\site-packages\sklearn\utils\validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample. DeprecationWarning)
我尝试在 total_target
上使用 ravel()
但它只是让我回到之前的错误。我认为我的格式有误我对 numpy
数组还很陌生。
最佳答案
Numpy 的 atleast_2d
让代码工作。
让我们首先生成一些模拟数据,即 800 行 x 1200 列的 5 个真实和 5 个伪造的 8 位图像:
In [111]: import numpy as np
In [112]: real, fake = 5, 5
In [113]: rows, cols = 800, 1200
In [114]: bits = 8
In [115]: target = np.hstack([np.ones(real), np.zeros(fake)])
In [116]: np.random.seed(2017)
In [117]: images = np.random.randint(2**bits, size=(real + fake, rows, cols))
In [118]: data = images.reshape(images.shape[0], -1)
In [119]: data
Out[119]:
array([[ 59, 9, 198, ..., 189, 201, 38],
[150, 251, 145, ..., 95, 214, 175],
[156, 212, 220, ..., 179, 63, 48],
...,
[ 25, 94, 108, ..., 159, 144, 216],
[179, 103, 217, ..., 92, 219, 34],
[198, 209, 177, ..., 6, 4, 144]])
In [120]: data.shape
Out[120]: (10L, 960000L)
然后我们使用除最后一张图像之外的所有图像来训练分类器:
In [121]: from sklearn import svm
In [122]: classifier = svm.SVC(gamma=0.001)
In [123]: classifier.fit(data[:-1], target[:-1])
Out[123]:
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape=None, degree=3, gamma=0.001, kernel='rbf',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)
如果您现在尝试通过 classifier.predict(data[-1])
对最后一张图像进行分类,sklearn 会报错。为了让 sklearn 开心,您只需要确保测试数据是二维的,如下所示:
In [124]: classifier.predict(np.atleast_2d(data[-1]))
Out[124]: array([ 1.])
关于python - scikit拟合数据错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41972452/