- 在下面的文本中我只想得到“7 月 4 日”
Hello, happy [4th of July]. I love the [[firework]]
我有这些文字:
text = {{Hey, how are you.}} I've watched John Mulaney all night. [[Category: Comedy]] [[Image: John Mulaney]]
我正在尝试删除{{嘿,你好吗。}}、[[类别:喜剧]] 和 [[图片:约翰·木兰尼]]。这是我到目前为止所尝试过的,但似乎不起作用:
hey_how_are_you = re.compile('\{\{.*\}\}')
category = re.compile('\[\[Category:.*?\]\]')
image = re.compile('\[\[Image:.*?\]\]')
text = hey_how_are_you.sub('', text)
text = category.sub('', text)
text = image.sub('', text)
最佳答案
# 1.
text="Hello, happy [4th of July]. I love the [[firework]]. "
l=re.findall(r"(?<!\[)\[([^\[\]]+)\]",text)
print(l,"\n",l[0])
# 2.
text2=" {{Hey, how are you.}} I've watched John Mulaney all night. [[Category: Comedy]] [[Image: John Mulaney]]"
print(re.sub(r"\{\{.*?\}\}|\[\[\s*Category:.*?\]\]|\[\[\s*Image:.*?\]\]","",text2))
Output:
['4th of July']
4th of July
I've watched John Mulaney all night.
In the 1st problem you can use negative lookbehind: (?<!\[)
Your regexp in the 2nd problem works for me. (What error you have?) However, it can be solved in one pass, too.
关于python - 获取一对括号内的文本,但不获取双方括号内的文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51183238/