python - 如何使用正则表达式提取 img 标签中的 src？

我正在尝试从 HTML img 标签中提取图像源 url。

如果 html 数据如下:

<div> My profile <img width='300' height='300' src='http://domain.com/profile.jpg'> </div>

或

<div> My profile <img width="300" height="300" src="http://domain.com/profile.jpg"> </div>

python 中的正则表达式怎么样？

我试过如下:

i = re.compile('(?P<src>src=[["[^"]+"][\'[^\']+\']])')
i.search(htmldata)

但是我得到了一个错误

Traceback (most recent call last):
File "<input>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

最佳答案

BeautifulSoup解析器是要走的路。

>>> from bs4 import BeautifulSoup
>>> s = '''<div> My profile <img width='300' height='300' src='http://domain.com/profile.jpg'> </div>'''
>>> soup = BeautifulSoup(s, 'html.parser')
>>> img = soup.select('img')
>>> [i['src'] for i in img if  i['src']]
[u'http://domain.com/profile.jpg']
>>>

关于python - 如何使用正则表达式提取 img 标签中的 src？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/33841638/

上一篇：python - 如何在 python 中显示与号 (&) 符号？

下一篇：python - 使用 Pandas 将csv文件读取为字典

相关文章：

regex - 从 MS Access 中的字符串提取/转换日期

python - 旧的 TensorFlow RNN 文件到哪里去了？

python - 应用程序在微服务架构中通信的最佳方式是什么

处理 if-else 的 pythonic 方式

python - 对从大型数据集聚合的数据使用 Altair

python - 如何在两个匹配之间插入模式？

javascript - JQuery字符串替换特定字符

python - 导入错误: cannot import name is_python_keyword

php - 正则表达式仅从文本中提取 IPv4 地址

javascript - 匹配一个包含特定符号的单词