我有一条消息正在尝试拆分。
import re
message = "Aug 10, 17:04 UTCThis is update 1.Aug 10, 15:56 UTCThis is update 2.Aug 10, 15:55 UTCThis is update 3."
split_message = re.split(r'[a-zA-Z]{3} (0[1-9]|[1-2][0-9]|3[0-1]), ([0-1]?[0-9]|2[0-3]):[0-5][0-9] UTC', message)
print(split_message)
预期输出:
["This is update 1", "This is update 2", "This is update 3"]
实际输出:
['', '10', '17', "This is update 1", '10', '15', "This is update 2", '10', '15', "This is update 3"]
不确定我错过了什么。
最佳答案
您正在使用“捕获组”,这就是为什么它们的内容也是结果数组的一部分的原因。您需要使用非捕获组(以 ?:
开头):
import re
message = "Aug 10, 17:04 UTCThis is update 1.Aug 10, 15:56 UTCThis is update 2.Aug 10, 15:55 UTCThis is update 3."
split_message = re.split(r"[a-zA-Z]{3} (?:0[1-9]|[1-2][0-9]|3[0-1]), (?:[0-1]?[0-9]|2[0-3]):[0-5][0-9] UTC", message)
print(split_message)
然而,您总是首先得到一个空条目,因为空字符串位于您的第一个拆分模式前面:
['', 'This is update 1.', 'This is update 2.', 'This is update 3.']
如 docs 中所述:
If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.
关于python - 根据正则表达式模式拆分字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68925596/