python - SkLearn : ValueError shapes not aligned during prediction

标签 python numpy machine-learning scikit-learn

我正在构建一个为文本描述分配多个标签/标签的程序。我正在使用 MultiOutputRegressor 来标记我的文本描述。当我预测一个向量化文本数组时,在最后一行 (y_pred = clf.predict(yTest)) 中弹出以下错误:

ValueError: shapes (74,28) and (3532,2) not aligned: 28 (dim 1) != 3532 (dim 0)

下面是我的代码:

textList = df.Text
vectorizer2 = TfidfVectorizer(stop_words=stopWords)
vectorizer2.fit(textList)
x = vectorizer2.transform(textList)

tagList = df.Tags
vectorizer = MultiLabelBinarizer()
vectorizer.fit(tagList)
y = vectorizer.transform(tagList)

print("x.shape = " + str(x.shape))
print("y.shape = " + str(y.shape))

xTrain, xTest, yTrain, yTest = train_test_split(x, y, test_size=0.50)

nb_clf = MultinomialNB()
sgd = SGDClassifier()
lr = LogisticRegression()
mn = MultinomialNB()

xTrain = csr_matrix(xTrain).toarray()
xTest = csr_matrix(xTest).toarray()
yTrain = csr_matrix(yTrain).toarray()

print("xTrain.shape = " + str(xTrain.shape))
print("xTest.shape = " + str(xTest.shape))
print("yTrain.shape = " + str(yTrain.shape))
print("yTest.shape = " + str(yTest.shape))

for classifier in [nb_clf, sgd, lr, mn]:
    clf = MultiOutputRegressor(classifier)
    clf.fit(xTrain, yTrain)
    y_pred = clf.predict(yTest)

下面是形状的打印语句:

x.shape = (147, 3532)
y.shape = (147, 28)
xTrain.shape = (73, 3532)
xTest.shape = (74, 3532)
yTrain.shape = (73, 28)
yTest.shape = (74, 28)

最佳答案

这可能只是因为您将 yTest 作为 clf.test() 的输入而不是 xTest

关于python - SkLearn : ValueError shapes not aligned during prediction,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57388144/

相关文章:

python - 提高查询性能

python - 是否可以对访问 numpy 数组中不同元素的函数进行矢量化?

python - 获得范围内频率平均值的最快方法

python - Tensorflow梯度为0,权重不更新

machine-learning - 文献类型分类

python - 添加 pandas 数据框中的所有行

python - 在 Python 中维护访问计数排序列表的有效方法

python - 如何在 Scikit-Learn 的随机森林分类器中设置子样本大小?特别是对于不平衡数据

python - numpy 数组中的值表现奇怪

python - Keras:转置 Conv2D 层的内核以便在另一个 Conv2D 层中重用