python - 如何在 python 中删除给定开始和结束标记的子字符串?

标签 python python-2.x

我的蜇针看起来像:

u'\'\'\'Joseph Michael "Joe" Acaba\'\'\' (born May 17, 1967) is an [[Teacher|educator]], [[Hydrogeology|hydrogeologist]], and [[NASA]] [[astronaut]].<ref name="bio">{{Cite web|url=http://www.jsc.nasa.gov/Bios/htmlbios/acaba-jm.html|title=Astronaut Bio: Joseph Acaba|month=February | year=2006|publisher=[[NASA|National Aeronautics and Space Administration]]|author=NASA|accessdate=November 26, 2006}}</ref><ref name="bio2">{{Cite web|url=http://oeop.larc.nasa.gov/hep/hep-astronauts.html|title=NASA Hispanic Astronauts\n|publisher=National Aeronautics and Space Administration|author=NASA|accessdate=November 26, 2006}}</ref> In May 2004 he became the first person'

我想从 <ref 中删除所有测试至 ref>包括标记。我是 python 的新手,不确定执行此操作的最佳方法。

最佳答案

在这种情况下,正则表达式就可以正常工作:

import re
ref = re.compile(u'<ref.*?ref>', re.DOTALL)

ref.sub(u'', yourtext)

注意 re.DOTALL限定符;你的 <ref> 里面有换行符部分,我们也想删除它们。

演示:

>>> import re
>>> tst=u'\'\'\'Joseph Michael "Joe" Acaba\'\'\' (born May 17, 1967) is an [[Teacher|educator]], [[Hydrogeology|hydrogeologist]], and [[NASA]] [[astronaut]].<ref name="bio">{{Cite web|url=http://www.jsc.nasa.gov/Bios/htmlbios/acaba-jm.html|title=Astronaut Bio: Joseph Acaba|month=February | year=2006|publisher=[[NASA|National Aeronautics and Space Administration]]|author=NASA|accessdate=November 26, 2006}}</ref><ref name="bio2">{{Cite web|url=http://oeop.larc.nasa.gov/hep/hep-astronauts.html|title=NASA Hispanic Astronauts\n|publisher=National Aeronautics and Space Administration|author=NASA|accessdate=November 26, 2006}}</ref> In May 2004 he became the first person'
>>> ref = re.compile(u'<ref.*?ref>', re.DOTALL)
>>> ref.sub(u'', tst)
u'\'\'\'Joseph Michael "Joe" Acaba\'\'\' (born May 17, 1967) is an [[Teacher|educator]], [[Hydrogeology|hydrogeologist]], and [[NASA]] [[astronaut]]. In May 2004 he became the first person'

关于python - 如何在 python 中删除给定开始和结束标记的子字符串?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12824957/

相关文章:

Python Hotmail 登录

python 回合问题

python - 字典分解并创建具有相同值的新字典

python - 如何从字符串列表中提取数字?

python - numpy中的Unicode元素字符串比较

python - 将文件路径转换为 ​​URL

python - 如何在keras中构建嵌入层

c++ - 我应该从 C++ 转移到 Python 吗? ... 或另一种语言?

java - 是否可以在StanfordNLP for Python 上训练模型并在基于Java 的CoreNLP 中使用它?

python - 关于如何通过 Python 装饰器传递参数的困惑