python - 正则表达式: "don' t 返回其余部分“if "this condition"?

我正在使用 RegEx 搜索多行字符串，其中包含文件路径列表。

目标是:如果匹配在文件夹名称中 - 仅返回此文件夹路径(如果它们也匹配，则不返回任何子文件夹)。并且，如果匹配在文件名中，则返回整行(完整文件路径)。

我当前使用的返回整个字符串的模式:.*([^\\]*(John|Smith|Junior)){2}.*

期望返回的字符串:

C:\temp\John Smith Junior\file.pdf -> C:\temp\John Smith Junior\
C:\temp\John Smith Junior\John Smith Junior\file.pdf -> C:\temp\John Smith Junior\
C:\temp\John Smith Junior file.pdf -> C:\temp\John Smith Junior file.pdf

我尝试添加到模式的末尾，例如: [\\n] 或 (\|\n) 或 (?!=.+\) 但这并不完全按照我想要的方式工作。感谢您的帮助!

演示:https://regex101.com/r/98d6Ed/1

.*([^\\]*(John|Smith|Junior)){2}.*

最佳答案

使用 (John|Smith|Junior) 是一种替代，它将匹配替代选项 John、Smith 或 Junior 之一。

如果您想匹配整个字符串John Smith Junior，您可以在模式中使用它。

在 Python re 中，您可以使用 if 子句在 Junior 第一次出现后测试 \。

如果存在，则匹配，否则匹配除 \ 之外的任何字符，直到字符串末尾。

^.*?\bJunior\b(\\)?(?(1)|.*)

^ 字符串开头
.*?\bJunior\b 匹配第一个出现的 Junior
(\\)? 可选择捕获组 1 中的 \
(?(1)|.*) 有条件，使用 (?(1) 测试组 1 是否存在如果存在，则匹配，否则使用 .*

Regex demo | Python demo

import re

strings = [
    r"C:\temp\John Smith Junior\file.pdf",
    r"C:\temp\John Smith Junior\John Smith Junior\file.pdf",
    r"C:\temp\John Smith Junior file.pdf"
]

for s in strings:
    m = re.match(r".*?\bJunior\b(\\)?(?(1)|.*)", s)
    if m:
        print(m.group())

输出

C:\temp\John Smith Junior\
C:\temp\John Smith Junior\
C:\temp\John Smith Junior file.pdf

另一个选项，匹配至少 2 次交替中的一个名称，后跟匹配除换行符或反斜杠之外的任何字符:

^.*?\\[^\\\n]*\b(?:John|Smith|Junior)\s+(?:John|Smith|Junior)\b[^\\\n]*

Regex demo

关于python - 正则表达式: "don' t 返回其余部分“if "this condition"?，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/69528803/

python - 正则表达式: "don' t 返回其余部分“if "this condition"?

上一篇：javascript - 仅更新 ReactJS 状态项中的一项

下一篇：reactjs - 在 React 组件上使用 TypeScript 进行函数重载？