我正在尝试抓取此 html 文本的“data-src”元素内的所有数据:
[<div class="js-delayed-image-load" data-alt="A man covers his face during a sandstorm in Cairo, Egypt, 16 January 2019" data-height="549" data-src="https://ichef.bbci.co.uk/news/320/cpsprodpb/5DE9/production/_105214042_hi051682579.jpg" data-width="976"></div>,
, , , , , , , , , , , , , , , , , , , , ]
我正在使用此代码:
image_containers = soup.find_all('div', class_ = 'js-delayed-image-load')
print(type(image_containers))
print(len(image_containers))
for image in image_containers:
image.div['data-src']
它给了我这个错误:
TypeError
Traceback (most recent call last)
<ipython-input-546-fa82366c888d> in <module>()
4 image_containers
5 for image in image_containers:
----> 6 image.div['data-src']
TypeError: 'NoneType' object is not subscriptable
为什么它没有给我任何信息?有人可以告诉我我做错了什么吗?
谢谢!
最佳答案
image
已经是目标 div
节点。您不需要再次提取 div
(它没有子 div
,因此 image.div
返回 None
)。尝试一下
for image in image_containers:
image['data-src']
关于python-3.x - 为什么我无法抓取此 HTML 的 'data-src' 属性内的所有内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54251708/