python - 正则表达式:在复杂的正则表达式中平衡 "{}"(python)

我尝试使用正则表达式从复杂字符串中提取信息。我尝试提取第一个 { 和最后一个 } 中的内容作为内容。不幸的是，我很难处理嵌套的 {}。如何处理这个问题？

我认为关键是平衡所有正则表达式上的 {} ，到目前为止我还没有成功......请参阅下面的括号示例: Regular expression to match balanced parentheses

import re

my_string = """
extend mineral Uraninite {
    kinetics {
        rate = -3.2e-08 mol/m2/s
        area = Uraninite
        y-term, species = Uraninite
        w-term {
            species = H[+]
            power = 0.37
        }
    }
    kinetics {
        rate = 3.2e-09 mol/m2/s
        area = Uraninite
        y-term, species = Uraninite
        w-term {
            species = H[+]
            power = 0.37
        }
    }
}
"""

regex = re.compile(
        r"extend\s+"
        r"(?:(?P<phase>colloid|mineral|basis|isotope|solid-solution)\s+)?"
        r"(?P<species>[^\n ]+)\s+"
        r"{(?P<content>[^}]*)}\n\s+}")
extend_list = [m.groupdict() for m in regex.finditer(my_string)]

到目前为止，我得到了:

print(extended_list["content"])

"""
    kinetics {
        rate = -3.2e-08 mol/m2/s
        area = Uraninite
        y-term, species = Uraninite
        w-term {
            species = H[+]
            power = 0.37
"""

看来，我需要使用正则表达式包 regex因为re不支持递归。事实上，这似乎有效:

import regex as re
pattern = re.compile(r"{(?P<content>((?:[^{}]|(?R))*))}")
extend_list2 = [m.groupdict() for m in pattern.finditer(read_data)]

print(extended_list2["content"])

"""
kinetics {
        rate = -3.2e-08 mol/m2/s
        area = Uraninite
        y-term, species = Uraninite
        w-term {
            species = H[+]
            power = 0.37
        }
    }
    kinetics {
        rate = 3.2e-09 mol/m2/s
        area = Uraninite
        y-term, species = Uraninite
        w-term {
            species = H[+]
            power = 0.37
        }
    }
"""

但是将其插入到主模式中不起作用。

pattern = re.compile(
        r"extend\s+([^n]*)"
        r"(?:(?P<phase>colloid|mineral|basis|isotope|solid-solution)\s+)?"
        r"(?P<species>[^\n ]+)\s+"
        r"{(?P<content>((?:[^{}]|(?R))*))\}")
extend_list = [m.groupdict() for m in pattern.finditer(read_data)]

最佳答案

我相信当前的正则表达式可以写成

rx = r"extend\s+(.*)(?:(?P<phase>colloid|mineral|basis|isotope|solid-solution)\s+)?(?P<species>\S+)\s+({(?P<content>((?:[^{}]++|(?4))*))})"

(?R)更改为正则表达式子例程，({(?P<content>((?:[^{}]++|(?4))*))}) 。组 ID 为 Group 4，因此子例程声明为 (?4) 。您可以快速测试一下here .

[^n]*看起来像一个错字，它匹配零个或多个非 n字符。我用过.* ，尽可能匹配除换行符之外的零个或多个字符。

[^\n ]看起来像是尝试匹配非空白 block ，因此我建议 \S在这里。

关于python - 正则表达式:在复杂的正则表达式中平衡 "{}"(python)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/70044589/

python - 正则表达式:在复杂的正则表达式中平衡 "{}"(python)

上一篇：python - ChromeDriver版本不匹配错误: session not created: This version of ChromeDriver only supports Chrome version 91

下一篇：Flutter 不同屏幕导航在 onTap 上使用不确定的 ListView