我有一个很长的字符串,它是一个段落,但是在句号之后没有空格。例如:
para = "I saw this film about 20 years ago and remember it as being particularly nasty. I believe it is based on a true incident: a young man breaks into a nurses\' home and rapes, tortures and kills various women.It is in black and white but saves the colour for one shocking shot.At the end the film seems to be trying to make some political statement but it just comes across as confused and obscene.Avoid."
我正在尝试使用 re.sub 来解决这个问题,但输出不是我所期望的。
这是我做的:
re.sub("(?<=\.).", " \1", para)
我正在匹配每个句子的第一个字符,我想在它之前放一个空格。我的匹配模式是 (?<=\.).
,它(据说)检查句点后出现的任何字符。我从其他 stackoverflow 问题中了解到\1 匹配最后一个匹配的模式,所以我将我的替换模式写为 \1
,一个空格后跟先前匹配的字符串。
这是输出:
"I saw this film about 20 years ago and remember it as being particularly nasty. \x01I believe it is based on a true incident: a young man breaks into a nurses\' home and rapes, tortures and kills various women. \x01t is in black and white but saves the colour for one shocking shot. \x01t the end the film seems to be trying to make some political statement but it just comes across as confused and obscene. \x01void. \x01
re.sub
不是匹配任何以句点开头的字符并在其前添加空格将匹配的字符替换为 \x01
.为什么?如何在匹配的字符串前添加一个字符?
最佳答案
(?<=a)b
是 positive lookbehind .它匹配 b
以下 a
. a
没有被捕获。所以在你的表达中,我不确定 \1
的值(value)是什么在这种情况下代表,但它不是 (?<=...)
中的内容.
您当前的方法还有另一个缺陷:它会在 .
之后添加一个空格即使已经有人了。
在.
之后添加缺失的 空格,我建议采用不同的策略:
替换 .
-后跟非空格非点 .
和一个空格:
re.sub(r'\.(?=[^ .])', '. ', para)
关于python - 正则表达式将字符添加到匹配的字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42731970/