python - 请求响应留下了一些数据

标签 python web-scraping beautifulsoup python-requests

我正在尝试使用请求从 Google 抓取一些数据,但它无法从网页返回所有内容。

代码:

import requests
from bs4 import BeautifulSoup

headers = {'user-agent':'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Mobile Safari/537.36'}
url = 'https://www.google.com/search?num=50&q="potato+is+good"'

r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')

for idx, val in enumerate(soup.find_all('em'), 1):
    print('{} = {}'.format(idx, val))

输出:

1 = <em>potato is good</em>
2 = <em>potato is good</em>
3 = <em>potato is good</em>
4 = <em>Potato is good</em>
5 = <em>potato is good</em>
6 = <em>potato is good</em>
7 = <em>Potato is good</em>
8 = <em>potato is good</em>
9 = <em>potato is good</em>
10 = <em>potato is good</em>
11 = <em>potato.is.good</em>

它只显示了 11 个结果,但当我在 Google 上手动执行搜索时,有超过 35 个结果。

我的代码可能有什么问题?

最佳答案

它是否会像您通过移动设备进行搜索一样返回结果? 我刚刚尝试过,但在我的 iPhone 上的 Google 首页上只得到 11 个结果。也许不同的用户代理(如下所示)可以解决问题?

Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36(KHTML,如 Gecko)Chrome/41.0.2228.0 Safari/537.36

编辑:

我运行了这个:

import requests
from bs4 import BeautifulSoup

headers = {'user-agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'}
url = 'https://www.google.com/search?num=50&q="potato+is+good"'

r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')

for idx, val in enumerate(soup.find_all('em'), 1):
    print('{} = {}'.format(idx, val))

得到:

1 = <em>potato is good</em>
2 = <em>potato is good</em>
3 = <em>potato is good</em>
4 = <em>potato is good</em>
5 = <em>potatoÔÇØ is good</em>
6 = <em>potato is good</em>
7 = <em>potato-is good</em>
8 = <em>potato is good</em>
9 = <em>Potato is good</em>
10 = <em>potato is good</em>
11 = <em>potato is good</em>
12 = <em>potato is good</em>
13 = <em>potato is good</em>
14 = <em>potato is good</em>
15 = <em>potato is good</em>
16 = <em>potato is good</em>
17 = <em>Potato is good</em>
18 = <em>potato is good</em>
19 = <em>potato is good</em>
20 = <em>potato is good</em>
21 = <em>potato is good</em>
22 = <em>potato is good</em>
23 = <em>potato is good</em>
24 = <em>potato is good</em>
25 = <em>potato is good</em>
26 = <em>potato is good</em>
27 = <em>potato is good</em>
28 = <em>potato is good</em>
29 = <em>potato is good</em>
30 = <em>potato is good</em>
31 = <em>potato is good</em>
32 = <em>potato is good</em>
33 = <em>potato is good</em>
34 = <em>potato is good</em>
35 = <em>potato is good</em>
36 = <em>potato is good</em>

关于python - 请求响应留下了一些数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48745678/

相关文章:

database - NHibernate 多线程 session 管理

python - 从 Beautifulsoup4 获取字符串时出现问题

python - 使用 python 抓取网站

python - 在Python中使用BS4抓取数据,嵌套表

python - 在频谱 numpy 中找到峰值位置

python - 如何使用Python BeautifulSoup提取td HTML标签?

Python Pandas Proc 转置等价物

c# - 如何使用ScrapySharp解析html文档中的元素?

python - 在训练周期的一部分之后运行评估

python - 找不到套接字服务器的匹配发行版