我有一个字符串,其中包含以下形式的句子:
"Ms Smith to talk to her colleague Ms Smith to create new events for the team. team Leader's assistant to organise morning stand-up session. to drive around the city."
- 句子可能有也可能没有标点符号或正确的大小写。
- 文本中也可能存在噪音(额外的字符、单词)。
- 我想按以下结构进行切片:
- “致小姐/女士/先生/夫人”
- “小姐/女士/先生/夫人”
- “团队领导”
- “团队领导致”
- “.至”
我想将其切片到列表中:
["Ms Smith to talk to her colleague",
"Ms Smith to create new events for the team.",
"team Leader's assistant to organise morning stand-up session.",
"to drive around the city."]
我当前的解决方案有效,但非常不符合Python风格,我确信有方法可以避免 while 循环:
def slice(text):
parts = []
rule = "(^.+?)(?:(?:miss [a-z]+|ms [a-z]+|mrs [a-z]+|mr [a-z]+|team leader)(?:'s [a-z ]+?)?|\.) to.+?$"
while True:
try:
part = re.findall(rule, text)[0]
parts.append(part)
# Remove part from text for next iteration
text = text[len(part):]
except IndexError:
# findall returned empty list
break
# Add the remainder
parts.append(text)
return parts
感谢您的帮助!
最佳答案
您只需 findall
并捕获子组即可完成您想做的所有事情。这个输出是您想要的吗?
import re
s = "Ms Smith to talk to her colleague Ms Smith to create new events for the team. " +\
"team Leader's assistant to organise morning stand-up session. to drive around the city."
roles = "Miss|Ms|Mr|Mrs|team leader"
matches = re.findall(f"""
(
\ ?
(
(?:{roles})? # Read the role
(?:[\w\'\-\ ]*?) # and name to group(1) aka "identity"
)
to
([\w\'\-\ ]+?) # Read the other words to group(2) aka "task"
(?={roles}|\.) # until next role or dot
[\.\ ]?
)
""",
s,
flags=re.IGNORECASE | re.VERBOSE,
)
print("Full matches:")
for m in matches:
print(" *", m[0].strip())
print("\nSplit by identity and task:")
for full, identity, task in matches:
print(f" * Identity: '{identity}', task: '{task.strip()}, full match: '{full.strip()}'")
输出:
Full matches:
* Ms Smith to talk to her colleague
* Ms Smith to create new events for the team.
* team Leader's assistant to organise morning stand-up session.
* to drive around the city.Split by identity and task:
* Identity: 'Ms Smith ', task: 'talk to her colleague, full match: 'Ms Smith to talk to her colleague'
* Identity: 'Ms Smith ', task: 'create new events for the team, full match: 'Ms Smith to create new events for the team.'
* Identity: 'team Leader's assistant ', task: 'organise morning stand-up session, full match: 'team Leader's assistant to organise morning stand-up session.'
* Identity: '', task: 'drive around the city, full match: 'to drive around the city.'
关于Python/正则表达式 : How to slice string by regex pattern while keeping the pattern in the matches as well?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60344077/