我试图在 python 中编写一个正则表达式,它会找到所有 img 标签,其中 src 属性等于特定值。我试着写下面的
# where thm equal /public_media/cache/84/b5/84b59e293cbdb7041b68a84977d62cf3.jpg?image_pk=82
p = re.compile(r'<img.*?%s.*?>' % thm)
print p.pattern
print p.sub(linked_image, c)
下面是我得到的输出
<img.*?/public_media/cache/84/b5/84b59e293cbdb7041b68a84977d62cf3.jpg?image_pk=82.*?>
<p><img src="/public_media/cache/84/b5/84b59e293cbdb7041b68a84977d62cf3.jpg?image_pk=82" alt=""></p><p>lksj lksdfj lsdkfj sldkfj sldkfj lskdfj lsjf lksjf lksj flksdjf klsj flk dkj sdlkfj sdlkfj sldkjf sldkfj lsdkjf lskjflsjfsl lksdjf </p><p>lksj lksdfj lsdkfj sldkfj sldkfj lskdfj lsjf lksjf lksj flksdjf klsj flk dkj sdlkfj sdlkfj sldkjf sldkfj lsdkjf lskjflsjfsl lksdjf
</p><p>lksj lksdfj lsdkfj sldkfj sldkfj lskdfj lsjf lksjf lksj flksdjf klsj flk dkj sdlkfj sdlkfj sldkjf sldkfj lsdkjf lskjflsjfsl lksdjf
</p><p>lksj lksdfj lsdkfj sldkfj sldkfj lskdfj lsjf lksjf lksj flksdjf klsj flk dkj sdlkfj sdlkfj sldkjf sldkfj lsdkjf lskjflsjfsl lksdjf
</p><p>lksj lksdfj lsdkfj sldkfj sldkfj lskdfj lsjf lksjf lksj flksdjf klsj flk dkj sdlkfj sdlkfj sldkjf sldkfj lsdkjf lskjflsjfsl lksdjf
</p><p>lksj lksdfj lsdkfj sldkfj sldkfj lskdfj lsjf lksjf lksj flksdjf klsj flk dkj sdlkfj sdlkfj sldkjf sldkfj lsdkjf lskjflsjfsl lksdjf
</p><p>lksj lksdfj lsdkfj sldkfj sldkfj lskdfj lsjf lksjf lksj flksdjf klsj flk dkj sdlkfj sdlkfj sldkjf sldkfj lsdkjf lskjflsjfsl lksdjf
</p><p>lksj lksdfj lsdkfj sldkfj sldkfj lskdfj lsjf lksjf lksj flksdjf klsj flk dkj sdlkfj sdlkfj sldkjf sldkfj lsdkjf lskjflsjfsl lksdjf
</p><p>lksj lksdfj lsdkfj sldkfj sldkfj lskdfj lsjf lksjf lksj flksdjf klsj flk dkj sdlkfj sdlkfj sldkjf sldkfj lsdkjf lskjflsjfsl lksdjf
</p><p>lksj lksdfj lsdkfj sldkfj sldkfj lskdfj lsjf lksjf lksj flksdjf klsj flk dkj sdlkfj sdlkfj sldkjf sldkfj lsdkjf lskjflsjfsl lksdjf
</p><p>lksj lksdfj lsdkfj sldkfj sldkfj lskdfj lsjf lksjf lksj flksdjf klsj flk dkj sdlkfj sdlkfj sldkjf sldkfj lsdkjf lskjflsjfsl lksdjf
</p><p>lksj lksdfj lsdkfj sldkfj sldkfj lskdfj lsjf lksjf lksj flksdjf klsj flk dkj sdlkfj sdlkfj sldkjf sldkfj lsdkjf lskjflsjfsl lksdjf
最佳答案
LXML 的解决方案
为了比较正则表达式和 LXML 的解决方案,我创建了另一篇文章:
一个更简单和更稳定的解决方案是将 lxml
与 etree
一起使用。在那个解决方案中你
访问某些 DOM 元素并编辑它们。
转换 HTML 字符串并通过正确的 xpath
获取它,例如.//img
。 xpath
返回所有已找到元素的列表,您可以在其中获取
和设置
src
属性。
函数 etree.tostring(tree)
返回一个编辑过的字符串:
from lxml import etree
tree = etree.HTML('''<html>
<body>
<h1>Title</h1>
<img src="/media/old/another_logo.png" alt="" />
<p>Lorem Ipsum</p>
<p><img src="/media/old/logo.png" alt=""/></p>
</body>
</html>''')
imgs = tree.xpath('.//img')
for img in imgs:
print 'OLD_SOURCE', img.get('src')
img.set('src', '/media/new/python.jpg')
print etree.tostring(tree)
输出
OLD_SOURCE /media/old/another_logo.png
OLD_SOURCE /media/old/logo.png
<html>
<body>
<h1>Title</h1>
<img src="/media/new/python.jpg" alt=""/>
<p>Lorem Ipsum</p>
<p><img src="/media/new/python.jpg" alt=""/></p>
</body>
</html>
关于python正则表达式查找并替换具有特定属性值的html标签,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20595735/