Python 多行正则表达式替换

又问了一个正则表达式问题，我感觉很糟糕，但这让我在过去的一周里发疯了。

我正在尝试在 python 中使用正则表达式来替换一些如下所示的文本:

text = """some stuff line with text other stuff [code language='cpp'] #include <cstdio> int main() { printf("Hello"); } [/code] Maybe some other text"""

我想要做的是捕获[code]标签内的文本，在每行前面添加一个制表符(\t)，然后替换所有[code]...[/code] 由带有制表符的新行组成。也就是说，我希望结果如下所示:

"""some stuff line with text other stuff #include <cstdio> int main() { printf("Hello"); } Maybe some other text"""

我正在使用以下代码段。

class CodeParser(object): """Parse a blog post and turn it into markdown.""" def __init__(self): self.regex = re.compile('.*\[code.*?\](?P<code>.*)\[/code\].*', re.DOTALL) def parse_code(self, text): """Parses code section from a wp post into markdown.""" code = self.regex.match(text).group('code') code = ['\t%s' % s for s in code.split('\n')] code = '\n'.join(code) return self.regex.sub('\n%s\n' % code, text)

这个问题是它匹配 code 标签前后的所有字符，因为初始和最终 .* 并且当我执行替换时，这些被删除。如果我删除 .*，re 将不再匹配任何内容。

我认为这可能是换行符的问题，所以我尝试将所有 '\n' 替换为 '¬'，执行匹配，然后然后将 '¬' 改回 '\n'，但我没有采用这种方法。

如果有人有更好的方法来完成我想完成的事情，我愿意接受建议。

谢谢。

最佳答案

您走在正确的轨道上。使用 regex.search 而不是 regex.match。这样你就可以摆脱前导和尾随 .*s.

Try this: def __init__(self): self.regex = re.compile('\[code.*?\](?P<code>.*)\[/code\]', re.DOTALL) def parse_code(self, text): """Parses code section from a wp post into markdown.""" # Here we are using search which finds the pattern anywhere in the # string rather than just at the beginning code = self.regex.search(text).group('code') code = ['\t%s' % s for s in code.split('\n')] code = '\n'.join(code) return self.regex.sub('\n%s\n' % code, text)

关于Python 多行正则表达式替换，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/31361264/

Python 多行正则表达式替换

上一篇：python - 扭曲的框架服务器作为客户端建立连接？

下一篇：python - 使用 Python 将数据从 Arduino 发送到互联网