Python/正则表达式 : How to slice string by regex pattern while keeping the pattern in the matches as well?

标签 python regex

我有一个字符串,其中包含以下形式的句子:

"Ms Smith to talk to her colleague Ms Smith to create new events for the team. team Leader's assistant to organise morning stand-up session. to drive around the city."

  • 句子可能有也可能没有标点符号或正确的大小写。
  • 文本中也可能存在噪音(额外的字符、单词)。
  • 我想按以下结构进行切片:
    • “致小姐/女士/先生/夫人”
    • “小姐/女士/先生/夫人”
    • “团队领导”
    • “团队领导致”
    • “.至”

我想将其切片到列表中:

["Ms Smith to talk to her colleague",
"Ms Smith to create new events for the team.",
"team Leader's assistant to organise morning stand-up session.",
"to drive around the city."]

我当前的解决方案有效,但非常不符合Python风格,我确信有方法可以避免 while 循环:

def slice(text):
    parts = []
    rule = "(^.+?)(?:(?:miss [a-z]+|ms [a-z]+|mrs [a-z]+|mr [a-z]+|team leader)(?:'s [a-z ]+?)?|\.) to.+?$"
    while True:
        try:
            part = re.findall(rule, text)[0]
            parts.append(part)
            # Remove part from text for next iteration
            text = text[len(part):]
        except IndexError:
            # findall returned empty list
            break
    # Add the remainder
    parts.append(text)
    return parts

感谢您的帮助!

最佳答案

您只需 findall 并捕获子组即可完成您想做的所有事情。这个输出是您想要的吗?

import re

s = "Ms Smith to talk to her colleague Ms Smith to create new events for the team. " +\
    "team Leader's assistant to organise morning stand-up session. to drive around the city."

roles = "Miss|Ms|Mr|Mrs|team leader"
matches = re.findall(f"""
    (
        \ ?
        (
            (?:{roles})?      # Read the role
            (?:[\w\'\-\ ]*?)  # and name to group(1) aka "identity"
        )
        to
        ([\w\'\-\ ]+?)  # Read the other words to group(2) aka "task"
        (?={roles}|\.)  # until next role or dot
        [\.\ ]?
    )
    """,
    s,
    flags=re.IGNORECASE | re.VERBOSE,
)

print("Full matches:")
for m in matches:
    print(" *", m[0].strip())


print("\nSplit by identity and task:")
for full, identity, task in matches:
    print(f" * Identity: '{identity}', task: '{task.strip()}, full match: '{full.strip()}'")

输出:

Full matches:
* Ms Smith to talk to her colleague
* Ms Smith to create new events for the team.
* team Leader's assistant to organise morning stand-up session.
* to drive around the city.

Split by identity and task:
* Identity: 'Ms Smith ', task: 'talk to her colleague, full match: 'Ms Smith to talk to her colleague'
* Identity: 'Ms Smith ', task: 'create new events for the team, full match: 'Ms Smith to create new events for the team.'
* Identity: 'team Leader's assistant ', task: 'organise morning stand-up session, full match: 'team Leader's assistant to organise morning stand-up session.'
* Identity: '', task: 'drive around the city, full match: 'to drive around the city.'

关于Python/正则表达式 : How to slice string by regex pattern while keeping the pattern in the matches as well?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60344077/

相关文章:

c# - 如何使用 REGEX 将作者拆分为对象或数组 C#?

python - django-rest-swagger : How can I specify the parameter type in the docstring

Python语句短 'if-else'

java - 在 java 中将 "aa"与 "aaaa"匹配会返回 2——我希望它是 3

javascript - 为什么这个正则表达式不能在 javascript 中工作(在 perl 中工作)

html - Notepad++ 的正则表达式

python - 如何从具有值列表的字典列表中形成 DataFrame?

python - 控制 QToolBar 中纯文本按钮的大小

python - 获取 Pandas 数据框的子集时出现异常

Python 正则表达式 : match a range of numbers with a separator