python - 使正则表达式适应 python re 模块

我有一个正则表达式可以用来删除文件中 div id="content" 之前的所有内容并包括/在<div id="footer"之后

([\s\S]*)(?=<div id="content")|(?=<div id="footer)([\s\S]*)

我正在使用 re 模块来处理 python 中的正则表达式。我在 python 中使用的代码:

file = open(file_dir)
content = file.read()
result = re.search('([\s\S]*)(?=<div id="content")|(?=<div id="footer)([\s\S]*))', content)

我也尝试过使用 re.match。我无法返回我想要的内容。现在我只能让它返回 div#content 之前的所有内容

最佳答案

虽然不是advisable ，您可以提取您的内容而不是简单地匹配它:

import re

rx = re.compile(r'''
        .*?
        (
            <div\ id="content"
            .+?
        )
        <div\ id="footer
        ''', re.VERBOSE | re.DOTALL)

content = rx.findall(your_string_here, 1)[0]
print(content)

这产生

<div id="content" class="other">
i have this other stuff 
<div>More stuff</div>

参见 a demo on regex101.com .更好的是:使用解析器，例如BeautifulSoup相反。

关于python - 使正则表达式适应 python re 模块，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/44765301/

上一篇：python - 如何命令自动化测试停留在监视器的一个地方

下一篇：python - 将类的属性和确定它的方法分开是否可以/pythonic？

Python，映射类或函数

regex - 如何为 Github 设置此预提交消息 Hook ？

python - 使用 Python 从 csv 文件读取特定日期范围

python - 如何在 pygame 中添加敌人？

python - 如何将具有相同索引的两个单独列表的组合元素插入到另一个列表？

python - 将 curl 命令转换为 request.post

javascript - 如何为 id 验证编写正则表达式(JavaScript)？

javascript - 在 Powershell 中在此对象上找不到属性 'value'

Python 从 HTTP 响应中提取 JSON