我有一个包含如下文本的 json 文件:
dr. goldberg offers everything.parking is good.he's nice and easy to talk
如何提取关键字为“parking”的句子? 我不需要其他两个句子。
我试过这个:
with open("test_data.json") as f:
for line in f:
if "parking" in line:
print line
它打印所有文本而不是那个特定的句子。
我什至尝试过使用正则表达式:
f=open("test_data.json")
for line in f:
line=line.rstrip()
if re.search('parking',line):
print line
即使这样也显示相同的结果。
最佳答案
你可以使用 nltk.tokenize
:
from nltk.tokenize import sent_tokenize
from nltk.tokenize import word_tokenize
f=open("test_data.json").read()
sentences=sent_tokenize(f)
my_sentence=[sent for sent in sentences if 'parking' in word_tokenize(sent)] #this gave you the all sentences that your special word is in it !
作为一个完整的方法,你可以使用一个函数:
>>> def sentence_finder(text,word):
... sentences=sent_tokenize(text)
... return [sent for sent in sentences if word in word_tokenize(sent)]
>>> s="dr. goldberg offers everything. parking is good. he's nice and easy to talk"
>>> sentence_finder(s,'parking')
['parking is good.']
关于Python:用特定词提取句子,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27074905/