我有一个格式如下的字符串。
Test 2 Lorem ipsum dolor sit amet consectetur adipisicing elit.
Test 3 Lorem ipsum dolor sit amet consectetur adipisicing elit.
Test 4 Lorem ipsum dolor sit amet consectetur adipisicing elit.
Test 5 Lorem ipsum dolor sit amet consectetur adipisicing elit.
如何删除Test 2、Test 3等,让字符串变成这个样子?
Lorem ipsum dolor sit amet consectetur adipisicing elit.
Lorem ipsum dolor sit amet consectetur adipisicing elit.
Lorem ipsum dolor sit amet consectetur adipisicing elit.
Lorem ipsum dolor sit amet consectetur adipisicing elit.
Lorem ipsum dolor sit amet consectetur adipisicing elit.
我试过了:
test1 = re.compile(r'^Test \d ')
test2 = re.compile(r'^Test \d\d ')
text = re.sub(test1, '', text)
text = re.sub(test2, '', text)
但是没用
最佳答案
根据您显示的示例,请尝试以下操作。即使您的值开始时出现 1 次或多次 Test digit
,这也将起作用。
import re
var="""Test 2 Lorem ipsum dolor sit amet consectetur adipisicing elit.
Test 3 Lorem ipsum dolor sit amet consectetur adipisicing elit.
Test 4 Lorem ipsum dolor sit amet consectetur adipisicing elit.
Test 5 Lorem ipsum dolor sit amet consectetur adipisicing elit."""
print (re.sub(r'^(Test\s+\d+)(\s+Test\s+\d+)*\s*', '', var, flags=re.M))
说明:在这里使用 Python 的 re
库。然后使用 Python 的 re.sub
函数。在其中提供正则表达式以用 var(variable) 中的 NULL 替换匹配的值。
正则表达式的解释:
^(Test\s+\d+) ##From starting of value, matching Test followed by 1 or more spaces followed by 1 or more digits.
(\s+Test\s+\d+)* ##Matching 1 or more spaces followed by Test, followed by 1 or more spaces, followed by 1 or more occurrences of digits. matching 0 or more occurrences of this regex.
\s* ##Matching 0 or more occurrences of spaces here.
关于python - 如何使用正则表达式删除python字符串中的特定模式?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67975062/