python-3.x - 来自 MultinomialNB : float() argument must be a string or a number 的类型错误

标签 python-3.x machine-learning scikit-learn text-classification naivebayes

我正在尝试比较多项式、二项式和伯努利分类器的性能，但出现错误:

TypeError: float() argument must be a string or a number, not 'set'

下面的代码直到MultinomialNB。

documents = [(list(movie_reviews.words(fileid)), category)
             for category in movie_reviews.categories()
             for fileid in movie_reviews.fileids(category)]

random.shuffle(documents)

#print(documents[1])

all_words = []

for w in movie_reviews.words():
    all_words.append(w.lower())

all_words = nltk.FreqDist(all_words)

word_features = list(all_words.keys())[:3000]

def look_for_features(document):
    words = set(document)
    features = {}
    for x in word_features:
        features[x] = {x in words}
    return features

#feature set will be finding features and category
featuresets = [(look_for_features(rev), category) for (rev, category) in documents]

training_set = featuresets[:1400]
testing_set = featuresets[1400:]

#Multinomial
MNB_classifier = SklearnClassifier(MultinomialNB())
MNB_classifier.train(training_set)
print ("Accuracy: ", (nltk.classify.accuracy(MNB_classifier,testing_set))*100)

错误似乎出现在 MNB_classifier.train(training_set) 中。此代码中的错误类似于错误 here .

最佳答案

改变...

features[x] = {x in words}

到...

features[x] = x in words

第一行创建一个由 (word, {True}) 或 (word, {False}) 对组成的 featuresets 列表，即第二个元素是一个集合。 SklearnClassifier 不希望将其作为标签。

<小时/>

该代码看起来非常类似于 "Creating a module for Sentiment Analysis with NLTK" 中的代码。作者在那里使用了一个元组 (x in Words)，但它与 x in Words 没有什么不同。

关于python-3.x - 来自 MultinomialNB : float() argument must be a string or a number 的类型错误，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49415195/

上一篇：python - ValueError - 将图像数组输入字典

下一篇：R 机器学习服务器 - RevoScaleR 错误

相关文章：

python - 从 GitHub 存储库获取版本号

python - 使用 Scipy 实现逻辑回归 : Why does this Scipy optimization return all zeros?

python - sklearn 中梯度提升的基础学习器

python - 在python3中读取和写入表格文本文件

python - 如何将嵌套 xml(与子项同名)解析为 CSV？

python - Python中的两个独立异步循环

machine-learning - 选择哪种分类算法？

machine-learning - StackOverflow 标签预测器…请推荐一种机器学习方法？

python - 值错误: Classification metrics can't handle a mix of unknown and binary targets?

python - Scikit-Learn 与 Keras (Tensorflow) 用于多项逻辑回归