我正在尝试从包含大约 90000 行的文件中提取单词(每行有三到几百个单词。 我想在词干后将这些行附加到列表中。我能够将词干单词插入到包含一行的列表中。我想将单词插入到列表中,同时保留 90000 行。有什么想法吗?
clean_sentence = [] 打开(文件夹路径+文本文件名,'r',编码='utf-8')作为f:
for line in f:
sentence = line.split()
for word in sentence:
if word.endswith('er'):
clean_sentence.append(word[:-2])
else:
clean_sentence.append(word)
x = ' '.join(clean_sentence)
with open('StemmingOutFile.txt','w', encoding="utf8") as StemmingOutFile:
StemmingOutFile.write(x)
该文件不是英文的,但这里有一个示例说明了当前的问题:当前代码产量:
why don't you like to watch TV? are there any more fruits? why not?
我希望输出文件是:
why don't you like to watch TV?
are there any more fruits?
why not?
最佳答案
按行读取文件:
with open('file.txt','r') as f:
lines = f.read().splitlines()
然后进行词干提取:
new_lines = []
for line in lines:
new_lines.append(' '.join[stemmed(word) for word in line])
其中 stemmed
是一个函数,如下所示:
def stemmed(word):
return word[:-2] if word.endswith('er') else word
然后将每一行new_lines
写入StemmingOutFile.txt中。
关于python - 如何将文件中的行追加到列表中,同时保留行数 - python 3,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48754613/