Python网页抓取无法找到网页中的所有标签

我正在尝试抓取特定网页。但我无法找到其中的所有段落标签。

我已经解决了以下问题

Beautiful Soup findAll doen't find them all ，但这似乎并不能解决问题。

这是一个不断刷新的动态网页，如果我单击页面底部的“加载更多评论”按钮，则会加载其他内容。

代码:

from bs4 import BeautifulSoup
import requests

r = requests.get("http://www.cricbuzz.com/live-cricket-scores/18127")
data = r.text

soup = BeautifulSoup(data)
p = soup.find_all('p')

len(p)

10

print(p[9])

Boult to Hardik Pandya, FOUR, that is probably the blunder which will cost KKR the match. It shouldn't have been any more than a single. A low full toss which Hardik can't find any elevation with. He smacks it down to long-on, where Surya attacks the ball nicely but he misfields and the ball sneaks through

我是否能够从此网页中抓取整个评论数据？

最佳答案

要获取所有评论，您可以使用网站 API:http://push.cricbuzz.com/match-api/18127/commentary-full.json。它以 json 格式返回所有数据，您可以轻松解析并提取您需要的内容:

import requests

r = requests.get('http://push.cricbuzz.com/match-api/18127/commentary-full.json').json() 

all_comments = r['comm_lines']

# print first 10 comments
for comment in all_comments[:10]:
    if 'comm' in comment:
        print(comment['comm'])

关于Python网页抓取无法找到网页中的所有标签，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/43315307/

上一篇：python - 高级索引返回形状错误的数组

下一篇：python - 我可以在不使用 djcelery 的情况下将 Celery v4.0.2 与 Django v1.7.1 一起使用吗？

python - 如何在scrapy中获取div中<p>标签的数量？

python - 如何检查 URL 是否可下载？

python - Ipython Notebook 上的多核和多线程

Python 静态类型不起作用

使用更改 href 进行 Python 网络抓取

python - 如何连接两个单独的字符串

python - 如何向量化基于最后 x 行数据的 Pandas 计算

python - 如何使 cx_Freeze 将子模块编译成共享对象 (.so) 文件？

python - 属性错误 : type object 'numpy.ndarray' has no attribute '__array_function__'