python - 从网站上抓取动态变化图像的 URL

标签 python web-scraping beautifulsoup python-requests

我正在创建一个从 this website by Google 收集图像的 python 程序

网站上的图片会在一定秒数后发生变化，图片url也会随着时间发生变化。此更改由网站上的脚本处理。我不知道如何从中获取图像链接。

我尝试使用 BeautifulSoup 和 requests 库从网站的 html 代码中获取图像链接:

import requests
from bs4 import BeautifulSoup

url = 'https://clients3.google.com/cast/chromecast/home'
html = requests.get(url).text
soup = BeautifulSoup(html, 'html.parser')
tags = soup('img')
for tag in tags:
    print(tag)

但是代码返回:

{{background_url}}' in the image src ("ng-src")

例如:

<img class="S9aygc-AHe6Kc" id="picture-background" image-error-handler="" image-index="0" ng-if="backgroundUrl" ng-src="{{backgroundUrl}}"/>

如何从动态变化的站点获取图像链接？ BeautifulSoup 可以处理这个吗？如果不是，哪个图书馆会做这项工作？

最佳答案

import requests
import re


def main(url):
    r = requests.get(url)
    match = re.search(r"(lh4\.googl.+?mv)", r.text).group(1)
    match = match.replace("\\", "").replace("u003d", "=")
    print(match)


main("https://clients3.google.com/cast/chromecast/home")

关于python - 从网站上抓取动态变化图像的 URL，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/61158133/

上一篇：kubernetes-helm - 在 Helm 模板中编码整数

下一篇：javascript - 如何防止由 onAuthStateChanged Firebase Auth 引起的无限循环

相关文章：

python - 如何编写 DRY Python For 循环

c# - 为什么 element.click() 在 Cefsharp 中不起作用？

python - bs4 `next_sibling` VS `find_next_sibling`

python - Beautifulsoup for row 循环只运行一次？

python - beautifulsoup - 如何从结果字符串中提取链接？

python用语言环境设置自己的货币

python - python语言中列表理解的两个列表

python - 使用 python requests 和 json 发布文件

javascript - 如何将javascript渲染模块集成到scrapy中？

python - 抓取亚马逊评论，不能排除付费评论