python - 从停用词中清除列表

这个变量:

sent=[('include', 'details', 'about', 'your performance'),
('show', 'the', 'results,', 'which', 'you\'ve', 'got')]

需要清除停用词。
我试过

output = [w for w in sent if not w in stop_words]

但它没有奏效。
怎么了？

最佳答案

from nltk.corpus import stopwords

stop_words = {w.lower() for w in stopwords.words('english')}

sent = [('include', 'details', 'about', 'your', 'performance'),
        ('show', 'the', 'results,', 'which', 'you\'ve', 'got')]

如果您想创建一个没有停用词的单词列表；

>>> no_stop_words = [word for sentence in sent for word in sentence if word not in stop_words]
['include', 'details', 'performance', 'show', 'results,', 'got']

如果你想保持句子完整；

>>> sent_no_stop = [[word for word in sentence if word not in stop_words] for sentence in sent]
[['include', 'details', 'performance'], ['show', 'results,', 'got']]

但是，大多数时候您会使用单词列表(不带括号)；

sent = ['include', 'details', 'about', 'your performance','show', 'the', 'results,', 'which', 'you\'ve', 'got']

>>> no_stopwords = [word for word in sent if word not in stop_words]
['include', 'details', 'performance', 'show', 'results,', 'got']

关于python - 从停用词中清除列表，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/62293141/

上一篇：react-native - 安装 react-native-keychain 后编译失败(android)

下一篇：django - 在 AL2 上的弹性 beantalk 上运行 Django manage.py shell 命令

相关文章：

python - 跨列计算数据框中的 null/NaN 值

python - 函数返回值中的 dict[str, dict] 是什么意思？

python - scipy.interpolate.UnivariateSpline 不平滑，无论参数如何

python - 导入 NLTK 时出现 Rubypython 错误

php - NLTK 找不到该文件

python - 合并 csv 文件列和名称列

python - 列表 ID 与 str 切片

Python 文本处理 : NLTK and pandas

Python nltk 词干分析器从不删除前缀

python - NLTK:矢量化后的特征缩减