python - 在 countvectorizer() 中找不到 get_feature_names

标签 python pandas sklearn-pandas countvectorizer

我正在挖掘有关深度学习库的帖子的 Stack Overflow 数据转储。我想在我的语料库中识别停用词(例如“python”)。我想获取我的特征名称,以便识别词频最高的词。

我按如下方式创建文档和语料库:

with open("StackOverflow_2018_Data.csv") as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    line_count = 0
    pytorch_doc = ''
    tensorflow_doc = ''
    cotag_list = []
    keras_doc = ''
    counte = 0
    for row in csv_reader:
        if row[2] == 'tensorflow':
            tensorflow_doc += row[3] + ' '
        if row[2] == 'keras':
            keras_doc += row[3] + ' '
        if row[2] == 'pytorch':
            pytorch_doc += row[3] + ' '

corpus = [pytorch_doc, tensorflow_doc, keras_doc]
vectorizer = CountVectorizer()
x = vectorizer.fit_transform(corpus)
print(x)
x.toarray()
Dict = []
feat = x.get_feature_names()
for i,arr in enumerate(x):
    for x, ele in enumerate(arr):
        if i == 0:
            Dict += ('pytorch', feat[x], ele)
        if i == 1:
            Dict += ('tensorflow', feat[x], ele)
        if i == 2:
            Dict += ('keras', feat[x], ele)

sorted_arr = sorted(Dict, key=lambda tup: tup[2])

但是,我得到:

  File "sklearn_stopwords.py", line 83, in <module>
    main()
  File "sklearn_stopwords.py", line 50, in main
    feat = x.get_feature_names()
  File "/opt/anaconda3/lib/python3.7/site-packages/scipy/sparse/base.py", line 686, in __getattr__
    raise AttributeError(attr + " not found")
AttributeError: get_feature_names not found

最佳答案

get_feature_names 是 CountVectorizer 对象中的一个方法。您正在尝试访问 get_feature_names fit_transform 的结果,它是一个 scipy.sparse 矩阵。

您需要使用 vectorizer.get_feature_names()

试试这个 MVCE:

from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer()
corpus = ['This is the first document.',
          'This is the second second document.',
          'And the third one.',
          'Is this the first document?']

X = vectorizer.fit_transform(corpus)

features = vectorizer.get_feature_names()

features

输出:

['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third', 'this']

关于python - 在 countvectorizer() 中找不到 get_feature_names,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55523491/

相关文章:

python - 如何使用 sqlalchemy-migrate 将列类型从字符变化更改为整数

python - reshape Pandas.DataFrame 的 Pythonic 方式

python - Pandas:根据列表设置值

Python wav 绘图,该图未绘制

python - 我应该如何从 with 语句返回有趣的值?

python - DecisionTreeClassifier predict_proba 返回 0 或 1

python-3.x - 目前,我在运行代码时遇到此错误 : TypeError: SparseDataFrame() takes no arguments. 如何修复此问题?

pandas - Imputer 减少了我的数据框中列的大小

python - 当需要本地系统文件时,在 Django/python 中操作和创建 S3 文件

python - Pandas 和 nltk : get most common phrases