python - 停用词删除困境

我正面临 NLTK 中停用词功能的困境。我正在通过使用 NLTK 删除停用词来处理来自社交媒体平台的用户生成的内容。然而，问题是我想在用户文本中保留人称代词，这对分类任务很重要。这些包括诸如“我”“你”“我们”等词。

不幸的是，停用词功能也删除了这些词，我需要它们存在。我该如何解决这个问题？

最佳答案

import nltk
from nltk.corpus import stopwords
stop_words= stopwords.words('english')
type(stop_words)
print(len(stop_words))

如果您查看输出，停用词的类型是列表。然后:

personal_pronouns= ['i', 'you', 'she', 'he', 'they'] # you can add another words for remove
for word in personal_pronouns:
    if word in stop_words:
        stop_words.remove(word)
        print(word+ '  Deleted')
print(len(stop_words))

关于python - 停用词删除困境，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/61458623/

上一篇：c# - LiteDB 5 System.IO.IOException : The process cannot access the file

下一篇：javascript - WP Block Styles - 选择 block 样式时触发JS

相关文章：

python - 我如何获得 opencv 中所有对象的所有像素？

python - 如何收缩 NetworkX 中只有 2 条边的节点？

python - 比较数据格式

python - 将 PySpark 数据框重新采样从几个月到几周

python - scikit-learn 中的词汇匹配问题？

python-3.x - 使用 NaiveBayesClassifier 对文本进行分类

Python 和 NLTK : Baseline tagger

java - Metamap 运行本地引发 0x104567910

machine-learning - 在 PDF 文本挖掘中使用 AI/ML

python - nltk.tree.Tree 对象如何生成树的字符串表示形式？