我正在尝试使用请求从 Google 抓取一些数据,但它无法从网页返回所有内容。
代码:
import requests
from bs4 import BeautifulSoup
headers = {'user-agent':'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Mobile Safari/537.36'}
url = 'https://www.google.com/search?num=50&q="potato+is+good"'
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')
for idx, val in enumerate(soup.find_all('em'), 1):
print('{} = {}'.format(idx, val))
输出:
1 = <em>potato is good</em>
2 = <em>potato is good</em>
3 = <em>potato is good</em>
4 = <em>Potato is good</em>
5 = <em>potato is good</em>
6 = <em>potato is good</em>
7 = <em>Potato is good</em>
8 = <em>potato is good</em>
9 = <em>potato is good</em>
10 = <em>potato is good</em>
11 = <em>potato.is.good</em>
它只显示了 11 个结果,但当我在 Google 上手动执行搜索时,有超过 35 个结果。
我的代码可能有什么问题?
最佳答案
它是否会像您通过移动设备进行搜索一样返回结果? 我刚刚尝试过,但在我的 iPhone 上的 Google 首页上只得到 11 个结果。也许不同的用户代理(如下所示)可以解决问题?
Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36(KHTML,如 Gecko)Chrome/41.0.2228.0 Safari/537.36
编辑:
我运行了这个:
import requests
from bs4 import BeautifulSoup
headers = {'user-agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'}
url = 'https://www.google.com/search?num=50&q="potato+is+good"'
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')
for idx, val in enumerate(soup.find_all('em'), 1):
print('{} = {}'.format(idx, val))
得到:
1 = <em>potato is good</em>
2 = <em>potato is good</em>
3 = <em>potato is good</em>
4 = <em>potato is good</em>
5 = <em>potatoÔÇØ is good</em>
6 = <em>potato is good</em>
7 = <em>potato-is good</em>
8 = <em>potato is good</em>
9 = <em>Potato is good</em>
10 = <em>potato is good</em>
11 = <em>potato is good</em>
12 = <em>potato is good</em>
13 = <em>potato is good</em>
14 = <em>potato is good</em>
15 = <em>potato is good</em>
16 = <em>potato is good</em>
17 = <em>Potato is good</em>
18 = <em>potato is good</em>
19 = <em>potato is good</em>
20 = <em>potato is good</em>
21 = <em>potato is good</em>
22 = <em>potato is good</em>
23 = <em>potato is good</em>
24 = <em>potato is good</em>
25 = <em>potato is good</em>
26 = <em>potato is good</em>
27 = <em>potato is good</em>
28 = <em>potato is good</em>
29 = <em>potato is good</em>
30 = <em>potato is good</em>
31 = <em>potato is good</em>
32 = <em>potato is good</em>
33 = <em>potato is good</em>
34 = <em>potato is good</em>
35 = <em>potato is good</em>
36 = <em>potato is good</em>
关于python - 请求响应留下了一些数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48745678/