python - scikit-learn CountVectorizer 中的类型错误

我正在尝试使用 scikit-learn 进行一些文本分析。但是，当我尝试调用 CountVectorizer 时，会出现错误。示例代码和引发的错误如下:

    >>> from sklearn.feature_extraction.text import CountVectorizer
    >>> corpus = [  'This is the first document.', 'This is the second second document.',  'And    the third one.',  'Is this the first document?', ]
    >>> vectorizer = CountVectorizer(min_df=1)
    >>> X = vectorizer.fit_transform(corpus)
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/Library/Python/2.6/site-packages/sklearn/feature_extraction/text.py", line 789, in fit_transform
    vocabulary, X = self._count_vocab(raw_documents, self.fixed_vocabulary)
    File "/Library/Python/2.6/site-packages/sklearn/feature_extraction/text.py", line 716, in _count_vocab
    vocabulary = defaultdict(None)
    TypeError: first argument must be callable

这是我安装的错误还是什么？其他示例运行良好。

最佳答案

总结评论中的讨论:这是 Python 2.6.1 中的一个错误，已在较新版本的 Python 2.6(以及更高版本的 2.7+、3.2+...)中修复。

关于python - scikit-learn CountVectorizer 中的类型错误，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/19007407/

上一篇：python - 将 numpy 数组与 scipy odeint 一起使用

下一篇：python - 如何在 Matplotlib 中的可自定义位置的颜色栏上设置枢轴？

相关文章：

python - sklearn 用户的 R 插入符号

python - 创建 Pandas 数据框时出现值错误

Python 子列表创建

python - 我的 JSON 对象有什么问题？

python - 统计抛硬币的结果

memory - 执行 scikit-learns 剪影分数时如何修复 MemoryError？

python - ib_insync 减少日志记录的冗长

python - 如何通过Python脚本安装debian包？

python - 使用标签将列传递给输入器的正确方法？

python - 类型错误 : object of type 'Tensor' has no len() when using a custom metric in Tensorflow