python - 使用 Python 的 POS 标记提取名词(循环)

我想从巨大的文本文件中仅提取名词或名词组。下面的 python 代码工作正常，但只提取最后一行的名词。我很确定代码需要“追加”，但不知道如何(我是 python 的初学者。)

import nltk
import pos_tag
import nltk.tokenize 
import numpy

f = open(r'infile.txt', encoding="utf8")
data = f.readlines()

tagged_list = []

for line in data:
    tokens = nltk.word_tokenize(line)
    tagged = nltk.pos_tag(tokens)
    nouns = [word for word,pos in tagged \
            if (pos == 'NN' or pos == 'NNP' or pos == 'NNS' or pos == 'NNPS')]
    downcased = [x.lower() for x in nouns]
    joined = " ".join(downcased).encode('utf-8')
    into_string = str(nouns)

output = open(r"outfile.csv", "wb")
output.write(joined)
output.close()

结果如下所示:市中心的公寓交通，这是文件最后一行的名词词。我想将文件每一行的名词保存在一行中。例如，输入文件和相应的结果应该如下所示。

Input file:
I like the milk.
I like the milk and bread.
I like the milk, bread, and butter.

Output file:
milk
milk bread
milk bread butter

希望有人帮忙修复上面的代码。

最佳答案

在for循环的末尾添加一行，然后将其写入文件。

...
result = ""
for line in data:
    ...
    result += joined

output = open(r"outfile.csv", "w")
output.write(str(result))
output.close()

如果你想使用追加:

...
result_list = []
for line in data:
    ...
    result_list.append(joined)

output = open(r"outfile.csv", "w")
output.write(str(result_list))
output.close()

另外，如果使用结果列表，也可以使用这种写法:

...
output = open(r"outfile.csv", "w")
for item in result_list:
    output.write(str(item) + "\n")
output.close()

关于python - 使用 Python 的 POS 标记提取名词(循环)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46367383/

python - 使用 Python 的 POS 标记提取名词(循环)

上一篇：python - 无法在 python 中使用 langdetect 包

下一篇：python3/hy - 使用 hy.eval 时，导入和全局变量不共享