我有一个字符串:
s=
"(2021-06-29T10:53:42.647Z) [Denis]: hi
(2021-06-29T10:54:53.693Z) [Nicholas]: TA FOR SHOWING
(2021-06-29T11:58:29.053Z) [Nicholas]: how are you bane
(2021-06-29T11:58:29.053Z) [Nicholas]:
(2021-06-29T11:58:29.053Z) [Nicholas]: #END_REMOTE#
(2021-06-30T08:07:42.029Z) [Denis]: VAL 01JUL2021
(2021-06-30T08:07:42.029Z) [Denis]: ##ENDED AT 08:07 GMT##"
我想从中提取文本。预期输出为:
comments=['hi','TA FOR SHOWING','how are you bane',' ','#END_REMOTE#','VAL 01JUL2021','##ENDED AT 08:07 GMT##']
我尝试过的是:
comments=re.findall(r']:\s+(.*?)\n',s)
正则表达式运行良好,但我无法将空白文本获取为 ''
最佳答案
您可以在捕获组中排除匹配 ]
,如果您还想匹配最后一行的值,则可以断言字符串 $
的结尾code> 而不是将强制换行符与 \n
注意,\s
可以匹配换行符,否定字符类 [^]]*
也可以匹配换行符
]:\s+([^]]*)$
import re
regex = r"]:\s+([^]]*)$"
s = ("(2021-06-29T10:53:42.647Z) [Denis]: hi\n"
"(2021-06-29T10:54:53.693Z) [Nicholas]: TA FOR SHOWING\n"
"(2021-06-29T11:58:29.053Z) [Nicholas]: how are you bane \n"
"(2021-06-29T11:58:29.053Z) [Nicholas]: \n"
"(2021-06-29T11:58:29.053Z) [Nicholas]: #END_REMOTE#\n"
"(2021-06-30T08:07:42.029Z) [Denis]: VAL 01JUL2021\n"
"(2021-06-30T08:07:42.029Z) [Denis]: ##ENDED AT 08:07 GMT##")
print(re.findall(regex, s, re.MULTILINE))
输出
['hi', 'TA FOR SHOWING', 'how are you bane ', '', '#END_REMOTE#', 'VAL 01JUL2021', '##ENDED AT 08:07 GMT##']
如果您不想越界:
]:[^\S\n]+([^]\n]*)$
关于python-3.x - 空白字符串的正则表达式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69897354/