我想抓取 Google 搜索的多个页面。 到目前为止,我只能抓取第一页,但如何才能抓取多个页面。
from bs4 import BeautifulSoup
import requests
import urllib.request
import re
from collections import Counter
def search(query):
url = "http://www.google.com/search?q="+query
text = []
final_text = []
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text,"html.parser")
for desc in soup.find_all("span",{"class":"st"}):
text.append(desc.text)
for title in soup.find_all("h3",attrs={"class":"r"}):
text.append(title.text)
for string in text:
string = re.sub("[^A-Za-z ]","",string)
final_text.append(string)
count_text = ' '.join(final_text)
res = Counter(count_text.split())
keyword_Count = dict(sorted(res.items(), key=lambda x: (-x[1], x[0])))
for x,y in keyword_Count.items():
print(x ," : ",y)
search("girl")
最佳答案
url = "http://www.google.com/search?q=" + query + "&start=" + str((page - 1) * 10)
关于python - 使用 BeautifulSoup 抓取 Google 搜索,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53324849/