我正在尝试将段落中的多个匹配项转换为链接,同时保留最终输出中的周围文本。我匹配的模式让人想起 Markdown 的超链接语法,作为一种允许非技术用户定义他们想要在输入中链接的文本的方式(我通过 Sheets API/Python 访问的 Google Sheet)。我捕获的第一组是链接文本,第二组是查询字符串中键的值。
我已经能够成功匹配此模式的单个实例,但我的替换字符串替换了输出中的整个段落。
text = "2018 was a big year for my sourdough starter and me. Mostly
we worked on developing this [tangy bread](19928) and these [chewy
rolls] (9843). But we were also just content keeping each other
company and inspired to bake."
def link_inline(text):
# expand a proper link around recipe id
ref = re.search(r"(\[.*?\]\(\d+\))", text, re.MULTILINE).group(1)
if (len(ref) > 0):
link = re.sub("\[(.*?)\]\((\d+)\)", r"<a href='https://www.foo.com/recipes?rid=\2'>\1</a>", ref)
return text
else:
return "replacement failed"
目标是让此输出保持段落完整,并简单地替换 \[(.*?)\]\((\d+)\)
模式与以下字符串匹配,包括组的反向引用:<a href="https://www.foo.com?bar=\2">\1</a>
因此,它需要循环遍历文本以替换所有匹配项(大概是 re.finditer
?),并在模式匹配之外保留原始文本。但我不确定如何正确定义循环并执行此替换而不用我的替换字符串覆盖整个段落。
最佳答案
我使用了 re.compile
,并且没有在整个组周围放置括号,而是在 .*?
周围放置一对括号,在 周围放置另一对括号\d+
,因为这两部分代表我们想要提取并放入 URL 中的文本。
import re
def link_inline(text):
# expand a proper link around recipe id
ref = re.compile("\[(.*?)\]\((\d+)\)")
replacer = r'<a href="https://www.foo.com/recipes?rid=\2">\1</a>'
return ref.sub(replacer, text)
text = """
2018 was a big year for my sourdough starter and me. Mostly we worked on
developing this [tangy bread](19928) and
these [chewy rolls](9843). But we were also just
content keeping each other company and inspired to bake.
"""
print(link_inline(text))
输出:
2018 was a big year for my sourdough starter and me. Mostly we worked on
developing this <a href="https://www.foo.com/recipes?rid=19928">tangy bread</a> and
these <a href="https://www.foo.com/recipes?rid=9843">chewy rolls</a>. But we were also just
content keeping each other company and inspired to bake.
作为健全性检查,我尝试添加一些带有圆括号和中括号的额外字符串,这些字符串不是链接,例如 (this) here
和 [this] here
字符串文本
。一切仍然正常。
关于python - 如何替换文本 block 中的多个正则表达式匹配,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53876572/