我有这段代码,应该从列表中删除长度小于 4 个字符的所有单词,但它只是删除一些单词(我不确定是哪一个),但绝对不是全部:
#load in the words from the original text file
def load_words():
with open('words_alpha.txt') as word_file:
valid_words = [word_file.read().split()]
return valid_words
english_words = load_words()
print("loading...")
print(len(english_words[0]))
#remove words under 4 letters
for word in english_words[0]:
if len(word) < 4:
english_words[0].remove(word)
print("done")
print(len(english_words[0]))
#save the remaining words to a new text file
new_words = open("english_words_v3.txt","w")
for word in english_words[0]:
new_words.write(word)
new_words.write("\n")
new_words.close()
它输出:
loading...
370103
done
367945
words_alpha.txt 中有 67000 个英语单词
最佳答案
您想要通过使用 english_words[0][:]
获取 english_words
的副本来迭代它的副本。现在您正在迭代正在修改的同一个列表,这导致了奇怪的行为。所以 for 循环看起来像
for word in english_words[0][:]:
if len(word) < 4:
english_words[0].remove(word)
此外,您还可以通过列表理解简化第一个 for 循环,并且不需要将 word_file.read().split()
包装在列表中,因为它已经返回列表
所以你的代码看起来像
#load in the words from the original text file
def load_words():
with open('words_alpha.txt') as word_file:
#No need to wrap this into a list since it already returns a list
valid_words = word_file.read().split()
return valid_words
english_words = load_words()
#remove words under 4 letters using list comprehension
english_words = [word for word in english_words if len(word) >= 4]
print("done")
print(len(english_words))
#save the remaining words to a new text file
new_words = open("english_words_v3.txt","w")
for word in english_words:
new_words.write(word)
new_words.write("\n")
new_words.close()
关于python - 我正在尝试从列表中删除长度小于 4 个字符的所有单词,但它不起作用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56169697/