下面的代码提供了页面中所有数字标签的信息。我可以使用过滤器为每个区域提取一次
例如:https://opensignal.com/reports/2019/04/uk/mobile-network-experience ,我只对区域分析选项卡下和所有区域的数字感兴趣。
import requests
from bs4 import BeautifulSoup
html=requests.get("https://opensignal.com/reports/2019/04/uk/mobile-network-experience").text
soup=BeautifulSoup(html,'html.parser')
items=soup.find_all('div',class_='c-ru-graph__rect')
for item in items:
provider=item.find('span', class_='c-ru-graph__label').text
prodvalue=item.find_next_sibling('span').find('span', class_='c-ru-graph__number').text
print(provider + " : " + prodvalue)
我想要一个如下所示的表格或 df 复活节地区
o2 Vodaphone 3 EE
4G Availability 82 76.9 73.0 89.2
Upload Speed Experience 5.6 5.9 6.8 9.5
任何可以帮助获得结果的指针?
最佳答案
以下是我对所有地区的做法。需要 bs4 4.7.1。 AFAICS 您必须假设公司顺序一致。
import requests
from bs4 import BeautifulSoup
import pandas as pd
r = requests.get("https://opensignal.com/reports/2019/04/uk/mobile-network-experience")
soup = BeautifulSoup(r.content,'lxml') #'html.parser' if lxml not installed
metrics = ['4g-availability', 'video-experience', 'download-speed' , 'upload-speed', 'latency']
headers = ['02', 'Vodaphone', '3', 'EE']
results = []
for region in soup.select('.s-regional-analysis__region'):
for metric in metrics:
providers = [item.text for item in region.select('.c-ru-chart:has([data-metric="' + metric + '"]) .c-ru-graph__number')]
row = {headers[i] : providers[i] for i in range(len(providers))}
row['data-metric'] = metric
row['region'] = region['id']
results.append(row)
df = pd.DataFrame(results, columns = ['region', 'data-metric', '02','Vodaphone', '3', 'EE'] )
print(df)
<小时/>
示例输出:
关于python - 使用带循环的漂亮汤在 Python 中制作 Webscrape 交互式图表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56081493/