def word_feats(words):
return dict([(word, True) for word in words])
for tweet in negTweets:
words = re.findall(r"[\w']+|[.,!?;]", tweet) #splits the tweet into words
negwords = [(word_feats(words), 'neg')] #tag the words with feature
negfeats.append(negwords) #add the words to the feature list
for tweet in posTweets:
words = re.findall(r"[\w']+|[.,!?;]", tweet)
poswords = [(word_feats(words), 'pos')]
posfeats.append(poswords)
negcutoff = len(negfeats)*3/4 #take 3/4ths of the words
poscutoff = len(posfeats)*3/4
trainfeats = negfeats[:negcutoff] + posfeats[:poscutoff] #assemble the train set
testfeats = negfeats[negcutoff:] + posfeats[poscutoff:]
classifier = NaiveBayesClassifier.train(trainfeats)
print 'accuracy:', nltk.classify.util.accuracy(classifier, testfeats)
classifier.show_most_informative_features()
运行此代码时出现以下错误...
File "C:\Python27\lib\nltk\classify\naivebayes.py", line 191, in train
for featureset, label in labeled_featuresets:
ValueError: need more than 1 value to unpack
该错误来自 classifier = NaiveBayesClassifier.train(trainfeats) 行,我不确定为什么。我以前做过类似的事情,我的 trainfeats 接缝的格式与当时的格式相同...下面列出了该格式的示例...
[[({'me': True, 'af': True, 'this': True, 'joy': True, 'high': True, 'hookah': True, 'got': True}, 'pos')]]
我的 trainfeats 还需要什么其他值来创建分类器?强调文本
最佳答案
@Prune 的评论是正确的:您的 labeled_featuresets
应该是一系列对(二元素列表或元组):每个数据点的特征字典和类别。相反,trainfeats
中的每个元素都是一个包含一个元素的列表:这两个元素的元组。失去两个功能构建循环中的方括号,这部分应该可以正常工作。例如,
negwords = (word_feats(words), 'neg')
negfeats.append(negwords)
还有两件事:考虑使用 nltk.word_tokenize()
而不是进行自己的标记化。并且您应该随机化训练数据的顺序,例如与random.scramble(trainfeats)
。
关于Python NLTK Classifier.train(trainfeats)... ValueError : need more than 1 value to unpack,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40532789/