我想在过滤过程中删除“dan”,但没有成功。 这是我的代码
for row in readCSV:
_word = []
username = row[0]
date = row[1]
text = row[2].lower()
text = re.sub(r'@[A-Za-z0-9_]+','',text)
text = re.sub(r'http\S+', '',text)
text = replaceMultiple(text, ["!","@","#","$","%","^","&","*","(",
")","_","-","+","=","{","}","[","]",
"\\","/",",",".","?","<",">",":",";",
"'",'"',"~","0","1","2","3","4","5","6","7","8","9"], '')
text = text.strip()
nltk_tokens = nltk.word_tokenize(text)
stop_words = set(stopwords.words("indonesian"))
stop_words_new = ['aku','dan','duh','hhhmmm','thn','nih','tgl',
'hai','jazz','bro','broo','msh','']
new_stopwords_list = stop_words.union(stop_words_new)
stop_words_new 中的单词已被删除,“dan”除外。 为什么?
最佳答案
代码不应该工作,因为您正在使用列表加入集合。尝试将 stop_words_new 设置为集合而不是列表
关于python - 停用词不删除一个单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56705074/