我正在尝试抓取特定网页。但我无法找到其中的所有段落标签。
我已经解决了以下问题
Beautiful Soup findAll doen't find them all ,但这似乎并不能解决问题。
这是一个不断刷新的动态网页,如果我单击页面底部的“加载更多评论”按钮,则会加载其他内容。
代码:
from bs4 import BeautifulSoup
import requests
r = requests.get("http://www.cricbuzz.com/live-cricket-scores/18127")
data = r.text
soup = BeautifulSoup(data)
p = soup.find_all('p')
len(p)
10
print(p[9])
Boult to Hardik Pandya, FOUR, that is probably the blunder which will cost KKR the match. It shouldn't have been any more than a single. A low full toss which Hardik can't find any elevation with. He smacks it down to long-on, where Surya attacks the ball nicely but he misfields and the ball sneaks through
我是否能够从此网页中抓取整个评论数据?
最佳答案
要获取所有评论,您可以使用网站 API:http://push.cricbuzz.com/match-api/18127/commentary-full.json
。它以 json 格式返回所有数据,您可以轻松解析并提取您需要的内容:
import requests
r = requests.get('http://push.cricbuzz.com/match-api/18127/commentary-full.json').json()
all_comments = r['comm_lines']
# print first 10 comments
for comment in all_comments[:10]:
if 'comm' in comment:
print(comment['comm'])
关于Python网页抓取无法找到网页中的所有标签,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43315307/