python nltk循环打印标题而不是值

标签 python pandas nlp tokenize stop-words

我在 csv 文件中标记了句子，但是当我尝试删除 for 循环中的停止词时，它会停止打印单词并打印所有句子的列标题，不知道最后一行中的错误在哪里？

for review in tokenized_docs:
    new_review = []
    for token in review:
        new_token = x.sub(u'', token)
        if not new_token == u'':
            new_review.append(new_token)
    tokenized_docs_no_punctuation.append(new_review)
    words=pd.DataFrame(tokenized_docs_no_punctuation)
    #print(words)
    print([word for word in words if word not in stops])

输出显示如下

应该是文字而不是列标题数字。

最佳答案

由于代码中的 words 是数据帧，因此 word 在 for 循环中成为列名称 (0, 1, 2,.. )。

您只需更改为列表即可。例如，

# before
# words=pd.DataFrame(tokenized_docs_no_punctuation)

# after
words = tokenized_docs_no_punctuation[0]

为我工作。

关于python nltk循环打印标题而不是值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59454689/

上一篇：python - 从另一个 DataFrame 的列中的 JSON URL 生成 Dataframe

下一篇：python - 值错误 : bad input shape (2835, 18)

相关文章：

python数据框转换多种日期时间格式

python - 搜索两个整数 root**pwr = integer(user's input)

python - 使用 pandas 方法根据 bool 序列标记数据框中的行组

python - 使用 NLTK 和德语语料库从名词中获取性别

python - 序数替换

python - Python中的字符串连接与字符串替换

java - 如何在 Java 中使用 Google S2 库创建多边形

mysql - 1054，字段列表中的未知列 'index'

python - Pandas中Groupby进行跨组匹配的可能性

python - 在 spaCy 中使用正则表达式 : matching various (different cased) words