我正在尝试删除 <a>
包含特定句子的行标签如下:
text before line im interested in which may include <a> tag </a>
Go to <a href="#step2"> Step 2</a>
text after line im intrested in which may also include <a> tag </a>
到目前为止我想到的是:
(?!(Go to|Return to|Continue to)( )?)(<a(.*)?>(?!(( )?Step \d( )?))(.*)?<\/a>)|(<a.*(Go to|Return to|Continue to).*\/a>)
但这似乎并不能满足我的需要:-( <a> tag </a>
期望的结果:Go to Step 2
我错过了什么?
最佳答案
我的猜测是,也许这个表达方式可能与您的想法很接近,但不确定。
使用re.findall
进行测试
import re
regex = r"(go\s+to|return\s+to|continue\s+to)\s*<a\s+(?:[^>]+?)>([^<]+?)\s*</a>"
test_str = ("text before line im interested in which may include <a> tag </a>\n"
"Go to <a href=\"#step2\"> Step 2</a>\n"
"Return to <a href=\"#step2\"> Step 20 </a>\n"
"CONTINUE To <a href=\"#step2\"> Step 20 </a>\n"
"text after line im intrested in which may also include <a> tag </a>")
matches = re.findall(regex, test_str, re.IGNORECASE)
for match in matches:
print(match[0]+match[1])
输出
Go to Step 2
Return to Step 20
CONTINUE To Step 20
该表达式在 this demo 的右上角面板中进行了解释如果您想探索/简化/修改它。
关于python - 使用正则表达式删除 <a> 并仅从具有特定语言的 html 行返回其文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57045960/