我正在尝试搜索单词列表,因此我生成了以下代码:
narrative = "Lasix 40 mg b.i.d., for three days along with potassium chloride slow release 20 mEq b.i.d. for three days, Motrin 400 mg q.8h"
meds_name_final_list = ["lasix", "potassium chloride slow release", ...]
def all_occurences(file, str):
initial = 0
while True:
initial = file.find(str, initial)
if initial == -1:
return
yield initial
initial += len(str)
offset = []
for item in meds_name_final_list:
number = list(all_occurences(narrative.lower(), item))
offset.append(number)
期望的输出:正在搜索的单词语料库中的起始索引列表,例如:
offset = [[1], [3, 10], [5, 50].....]
此代码非常适用于不太长的单词,例如 antibiotics、emergency ward、insulin 等。但是,上面的函数不会检测到被新行间距打断的长单词。
所需词:氯化钾缓释
有什么解决这个问题的建议吗?
最佳答案
这个怎么样?
def all_occurences(file, str):
initial = 0
file = file.replace('\n', ' ')
while True:
initial = file.find(str, initial)
if initial == -1: return
yield initial
initial += len(str)
关于python - 找到一个被新行打断的长单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55015433/