string - BeautifulSoup Div 类返回空

标签 string python-3.x class beautifulsoup

我检查了类似的问题，但找不到解决方案...

我正在尝试从以下页面中获取额外旅行时间的分钟数 (46):https://www.tomtom.com/en_gb/trafficindex/city/istanbul

我尝试了两种方法(Xpath 和 find 类)，但都给出了空返回。

import requests
from bs4 import BeautifulSoup
from lxml.html import fromstring

page = requests.get("https://www.tomtom.com/en_gb/trafficindex/city/istanbul")
tree = fromstring(page.content)

soup = BeautifulSoup(page.content, 'html.parser')



#print([type(item) for item in list(soup.children)])

html = list(soup.children)[2]

g_data = soup.find_all("div", {"class_": "big.ng-binding"})

congestion = tree.xpath("/html/body/div/div[2]/div[2]/div[2]/section[2]/div/div[2]/div/div[2]/div/div[2]/div[1]/div[1]/text()")
print(congestion)
print(len(g_data))

我是否遗漏了一些明显的东西？

非常感谢您的帮助!

最佳答案

不幸的是，仅靠 BeautifulSoup 还不足以实现这一目标。该网站使用 JavaScript 生成内容，因此您必须使用其他工具，例如 Selenium。

import bs4 as bs
import re
from selenium import webdriver

url = 'https://www.tomtom.com/en_gb/trafficindex/city/istanbul'

driver = webdriver.Firefox()
driver.get(url)           
html = driver.page_source
soup = bs.BeautifulSoup(html)

我可以看到两种获取额外时间的方法:

1.使用 class="text-big ng-binding" 查找 div。

div = soup.find_all('div', attrs={'class' : 'text-big ng-binding'})
result = div[0].text

2.首先查找包含Per day文本的div，然后向上查找两个div

div = soup.find_all(text=re.compile('Per day'))
result = div.find_previous('div').find_previous('div').text

关于string - BeautifulSoup Div 类返回空，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/47737796/

上一篇：firebase-security - 用于更新文档中特定字段的 firestore 安全性

下一篇：php - 当我提交表单时，我的数据库不会插入我的记录

java - 使用StringBuffer删除重复项的方法

swift - iOS 和 Swift : ViewController class is defined, 但它是什么时候创建为对象的？

javascript - 如何在 JS 类中拥有 .on ('click' ) 事件

c++ - Boost智能指针设计问题

c++ - 是否有一个 std::string 函数来附加空格直到给定的缩进级别

javascript - 数组中的字符串在 HTML 中不显示为字符串

python-3.x - 找不到多个包的匹配发行版

python - 如何使用 Pandas 中另一个数据框的值更新一个数据框

Python隐藏控制台窗口