python - 如何解析粘在一起的两个元素?

标签 python beautifulsoup

我想从 zomato.com 获得评分和票数,但不幸的是,这些元素似乎粘在一起。很难解释,但我制作了一个快速视频来展示我的意思。

https://streamable.com/sdh0w

整个代码:https://pastebin.com/JFKNuK2a

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
response = requests.get("https://www.zomato.com/san-francisco/restaurants?q=restaurants&page=1",headers=headers)
content = response.content
bs = BeautifulSoup(content,"html.parser")

zomato_containers = bs.find_all("div", {"class": "search-snippet-card"})


for zomato_container in zomato_containers:
    rating = zomato_container.find('div', {'class': 'search_result_rating'})
    # numVotes = zomato_container.find("div", {"class": "rating-votes-div"})

    print("rating: ", rating.get_text().strip())
    # print("numVotes: ", numVotes.text())

最佳答案

您可以使用re模块来解析投票计数:

import re
import requests
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
response = requests.get("https://www.zomato.com/san-francisco/restaurants?q=restaurants&page=1",headers=headers)
content = response.content
bs = BeautifulSoup(content,"html.parser")

zomato_containers = bs.find_all("div", {"class": "search-snippet-card"})

for zomato_container in zomato_containers:
    print('name:', zomato_container.select_one('.result-title').get_text(strip=True))
    print('rating:', zomato_container.select_one('.rating-popup').get_text(strip=True))
    votes = ''.join( re.findall(r'\d', zomato_container.select_one('[class^="rating-votes"]').text) )
    print('votes:', votes)
    print('*' * 80)

打印:

name: The Original Ghirardelli Ice Cream and Chocolate...
rating: 4.9
votes: 344
********************************************************************************
name: Tadich Grill
rating: 4.6
votes: 430
********************************************************************************
name: Delfina
rating: 4.8
votes: 718
********************************************************************************

...and so on.

或者:

如果你不想使用re,你可以使用str.split():

votes = zomato_container.select_one('[class^="rating-votes"]').get_text(strip=True).split()[0]

关于python - 如何解析粘在一起的两个元素?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57370999/

相关文章:

python - 有没有一种方法可以使用BeautifulSoup从HTML文件中提取所有类名?

python - 如何按django中的计算值排序

python - CS50:LIKE 运算符,使用 % 扩展进行变量替换

python - 将 beautifulsoup 输出转换为矩阵

python-3.x - Python 网络抓取遗漏了搜索对象列表中的一个元素

python - 美汤刮痧 : Why won't the get_text method return the text of this element?

python - IPython autoreload 为重复调用 Python2 super() 提供错误

python - 在 Python 中通过 SQL 查询创建变量

python - 我不喜欢带有两个或多个可迭代对象的Python函数。这是个好主意吗?

python - 删除除 BeautifulSoup 的一个标签之外的所有 html 标签