import re
s = 'PythonCookbookListOfContents'
# the first line does not work
print re.split('(?<=[a-z])(?=[A-Z])', s )
# second line works well
print re.sub('(?<=[a-z])(?=[A-Z])', ' ', s)
# it should be ['Python', 'Cookbook', 'List', 'Of', 'Contents']
如何使用 Python re 从小写字符和大写字符的边界拆分字符串?
为什么第一行不行,第二行可以用?
根据 re.split
:
Note that split will never split a string on an empty pattern match.
For example:
>>> re.split('x*', 'foo')
['foo']
>>> re.split("(?m)^$", "foo\n\nbar\n")
['foo\n\nbar\n']
如何使用 re.findall
反而? (与其关注分隔符,不如关注你想要获得的项目。)
>>> import re
>>> s = 'PythonCookbookListOfContents'
>>> re.findall('[A-Z][a-z]+', s)
['Python', 'Cookbook', 'List', 'Of', 'Contents']
更新
使用 regex
module (替代正则表达式模块,替换 re),您可以在零宽度匹配上拆分:
>>> import regex
>>> s = 'PythonCookbookListOfContents'
>>> regex.split('(?<=[a-z])(?=[A-Z])', s, flags=regex.VERSION1)
['Python', 'Cookbook', 'List', 'Of', 'Contents']
注意:指定 regex.VERSION1
标志以启用零长度匹配行为拆分。