Python重新匹配空格和新行

标签 python regex

html6="""
<p<ins style="background:#e6ffe6;">re><code</ins>>
int aint bint c<ins style="background:#e6ffe6;"></code></ins></p<ins style="background:#e6ffe6;">re</ins>><p>int d</p>
"""

Html6和Html7是一样的，只是Html7有"\n"

html7="""
<p<ins style="background:#e6ffe6;">re><code</ins>>int a
int b
int c<ins style="background:#e6ffe6;">
</code></ins></p<ins style="background:#e6ffe6;">re</ins>>
<p>int d</p>
"""

p_to_pre_code_pattern = re.compile(
"""<p
<(?P<action_tag>(del|ins)) (?P<action_attr>.*)>re><code</(?P=action_tag)>
>
(?P<text>.*?)
<(?P=action_tag) (?P=action_attr)>
</code></(?P=action_tag)>
</p
<(?P=action_tag) (?P=action_attr)>re</(?P=action_tag)>
>""",re.VERBOSE)


print re.match(p_to_pre_code_pattern,html6)    
print re.match(p_to_pre_code_pattern,html7)

html6 和 html7 都不匹配？ , 但如果我将 "\n"替换为 ""，两者都会有很多。

print re.match(p_to_pre_code_pattern,html6.replace("\n",""))    
print re.match(p_to_pre_code_pattern,html7.replace("\n",""))

我想知道我应该如何更改 p_to_pre_code_pattern 以便在不调用 replace("\n","")) 的情况下同时匹配 html6 和 html7？

最佳答案

也许你在调用 re.compile(..., re.VERBOSE|re.DOTALL)< 时错过了 re.DOTALL 标志

re.S 
re.DOTALL 

Make the '.' special character match any character at all, including a newline;
without this flag, '.' will match anything except a newline.

关于Python重新匹配空格和新行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/9536313/

上一篇：python - 如何通过 Python 中子文件夹中的图像文件目录迭代图像处理？

下一篇：python - 在 OpenERP 的薪资部分中扣除休假

相关文章：

javascript - 如何分割正则表达式匹配

python - 使用 CouchDB-Python 批量取消删除 CouchDB 文档

python - Tensorflow 定制培训 - ValueError : Variable dense/kernel/Adam/does not exist?

python - SQLALCHEMY 设置默认值 False 可空 True

java - 如何用空格拆分字符串并将空格作为结果中的元素包含在内？多个空格拆分

javascript - 使用正则表达式测试数字是否出现 'x' 次

C++ 字符串差异(一种 Python 的差异库)

python - 根据另一个数据框中的值从数据框中选择行，并根据第二个数据框中的值更新其中一列

java - 在 Java 中创建有效的文件名

regex - sed 和 awk 正则表达式有什么区别