python - 从 next_sibling 获取文本 - BeautifulSoup 4

标签 python python-3.x beautifulsoup

我想从 this URL 中抓取 Restaurants|

for rests in dining_soup.select("div.infos-restos"):
    
    for rest in rests.select("h3"):
        safe_print("            Rest Nsme: "+rest.text)
        print(rest.next_sibling.next_sibling.next_sibling.next_sibling.contents)

输出

<div class="descriptif-resto">
<p>
<strong>Type of cuisine</strong>:International</p>
<p>
<strong>Opening hours</strong>:06:00-23:30</p>
<p>The Food Square bar and restaurant offers a varied menu in an elegant and welcoming setting. In fine weather you can also enjoy your meal next to the pool or relax on the garden terrace.</p>
</div>

打印(rest.next_sibling.next_sibling.next_sibling.next_sibling.text)

输出总是空的

所以我的问题是如何从那个 Div 中抓取美食类型营业时间

最佳答案

营业时间和菜品在"descriptif-resto" text:

import requests
from bs4 import BeautifulSoup
r = requests.get("http://www.accorhotels.com/gb/hotel-5548-mercure-niederbronn-hotel/restaurant.shtml")
soup = BeautifulSoup(r.content)

print(soup.find("div",attrs={"class":"descriptif-resto"}).text)

Type of cuisine:Brasserie

Opening hours:12:00 - 14:00 / 19:00 - 22:00

名字在第一个h3标签里,类型和营业时间在两个p标签里:

name = soup.find("div", attrs={"class":"infos-restos"}).h3.text
det = soup.find("div",attrs={"class":"descriptif-resto"}).p   

hours = det.find_next("p").text
tpe = det.text
print(name)
print(hours)
print(tpe)

LA STUB DU CASINO

Opening hours:12:00 - 14:00 / 19:00 - 22:00

Type of cuisine:Brasserie

好吧,有些地方没有开放时间和美食,所以您必须对其进行微调,但这会获取所有信息:

from itertools import chain

all_dets = soup.find_all("div", attrs={"class":"infos-restos"})
# get all names from h3 tagsusing chain so we can zip later
names = chain.from_iterable(x.find_all("h3") for x in  all_dets) 
# get all info to extract cuisine, hours
det = chain.from_iterable(x.find_all("div",attrs={"class":"descriptif-resto"}) for x in all_dets)
# zipp appropriate details with each name
zipped  = zip(names, det)

for name, det in zipped:
    details = det.p
    name, tpe = name.text, details
    hours = details.find_next("p") if "cuisine" in det.p.text else ""
    if hours: # empty string means we have a bar
        print(name, tpe.text, hours.text)
    else:
         print(name, tpe.text)
    print("-----------------------------")

LA STUB DU CASINO 
Type of cuisine:Brasserie 
Opening hours:12:00 - 14:00 / 19:00 - 22:00
-----------------------------
RESTAURANT DU CASINO IVORY 
Type of cuisine:French 
Opening hours:19:00 - 22:00
-----------------------------
BAR DE L'HOTEL LE DOLLY 
Opening hours:10:00-01:00 
-----------------------------
BAR DES MACHINES A SOUS 
Opening hours:10:30-03:00 
-----------------------------

关于python - 从 next_sibling 获取文本 - BeautifulSoup 4,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28519164/

相关文章:

python - asyncio 是否支持从非主线程运行子进程?

python - 使用 Beautiful Soup 查找包含 unicode 字形的元素

python - 如何将包含unicode escape\u####的字符串转换为utf-8字符串

python - 加速Python函数处理数据段以空格分隔的文件

python - 在 Python3 中评估字符串中的 UTF-8 文字转义序列

python - 如何使用 BeautifulSoup 在 Python 中获取特定标签属性文本?

python - BeautifulSoup 测试对象类型

python - urllib2 在请求不工作的地方工作

python - 使用plotly.express 的固定比例轴facetrow/facet_col

parsing - Python3解析器生成器