python - re.findall() 无法在另一个文件中找到一个文件中的行

我有两个文本文件:一个包含文章中的文本，另一个包含 phrasal verbs 列表。我试图找到文章中每个短语动词的每个实例。我知道这篇文章包含短语动词“log on”，短语动词列表也是如此。当我循环遍历短语动词并使用 re.findall() 搜索每个动词时，它没有找到任何动词。当我在短语动词列表的第 1199 行(恰好是单词“log on”)手动启动循环时，它找到了它。当我在前一行(第 1198 行)启动它时，它没有找到它。这是我的代码:

import re
PV_HI = []
file = open('article.txt')
for line in open('phrasalVerbs.txt'):
    pv = line.strip()
    pvFound = re.findall(pv, file.read(), flags=re.I)
    PV_HI.extend(pvFound)
print(PV_HI)

以下是短语动词列表文本文件的示例:

Lock onto
Lock out
Lock up
Lock away
Log in
Log into
Log off
Log on
Log out
Look after
Look back
Look down on
Look for
Look forward to
Look in
Look in on
Look into

以及文章文件的示例:

<p> If you have a business account, a higher Pay Anyone limit up to $500,000 and also have a Security Device to authorise third party payments and/or can add Operators, you are an ANZ Internet Banking for Business customer.
<p> How do I manage my accounts once I am registered for ANZ Internet Banking?
<p> If you have registered for ANZ Internet Banking, use your CRN and password to log on to ANZ Internet Banking.
<p> If you need help while logged on to ANZ Internet Banking, click the " Help " icon in the top right hand corner of all pages.

最终，我想要做的是获取 1600 个文件集中所有短语动词的计数。如果有更好的方法来做到这一点，我当然愿意接受建议。

谢谢!

马特

最佳答案

我保存了您的短语动词示例和文章文件(在末尾附加“登录”字符以查找)，然后使用您的 python 代码进行一些测试。一开始我也找不到任何结果。但是当我更改代码如下时:

import re
PV_HI = []
with open('article.txt', 'r') as f:
    article_content = f.read()
    for line in open('phrasalVerbs.txt'):
        pv = line.strip()
        pvFound = re.findall(pv, article_content, flags=re.I)
        PV_HI.extend(pvFound)
    print(PV_HI)

它可以工作并成功找到“登录”。希望能帮助到你。

关于python - re.findall() 无法在另一个文件中找到一个文件中的行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46801683/

python - re.findall() 无法在另一个文件中找到一个文件中的行

上一篇：python - 如何使 python 导入模块和文件夹一起导入？

下一篇：python - 从另一列计算数据帧列中的值，但前提是满足第三列中的条件