python - 在 NLTK 停用词列表中添加和删除单词

标签 python python-3.x list set nltk

我正在尝试从 NLTK 停用词列表中添加和删除单词:

from nltk.corpus import stopwords

stop_words = set(stopwords.words('french'))

#add words that aren't in the NLTK stopwords list
new_stopwords = ['cette', 'les', 'cet']
new_stopwords_list = set(stop_words.extend(new_stopwords))

#remove words that are in NLTK stopwords list
not_stopwords = {'n', 'pas', 'ne'} 
final_stop_words = set([word for word in new_stopwords_list if word not in not_stopwords])

print(final_stop_words)

输出:

Traceback (most recent call last):
  File "test_stop.py", line 10, in <module>
new_stopwords_list = set(stop_words.extend(new_stopwords))
AttributeError: 'set' object has no attribute 'extend'

最佳答案

试试这个:

from nltk.corpus import stopwords

stop_words = set(stopwords.words('french'))

#add words that aren't in the NLTK stopwords list
new_stopwords = ['cette', 'les', 'cet']
new_stopwords_list = stop_words.union(new_stopwords)

#remove words that are in NLTK stopwords list
not_stopwords = {'n', 'pas', 'ne'} 
final_stop_words = set([word for word in new_stopwords_list if word not in not_stopwords])

print(final_stop_words)

关于python - 在 NLTK 停用词列表中添加和删除单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51534586/

相关文章:

python - 每当触发 else 语句时,如何在 for 循环中创建新列表

python - 如何在Python中为列表的每个元素分配名称

python - 如何确定我是否在 Linux 中安装了 Python 模块?

python - 命名管道不会阻塞

python - 如何使用 kazoo 在 Python 中观察后代子节点?

python - Python 列表理解中的多个 If/else

c# - 使用 Tuple 的 TRest 组件

python - Python Opencv颜色范围直方图

python-3.x - 如何使用 selenium 和 python 查找不包含某个类名的元素

python-3.x - numpy中的双梯形积分