我想从新闻网站检索不同的类别。我正在使用 BeautifulSoup 从右侧获取文章标题。如何循环到网站左侧可用的各种类别?我刚刚开始学习这种代码,却不太了解它的工作原理。任何帮助将不胜感激。这是我正在开发的网站。 http://query.nytimes.com/search/sitesearch/#/ */ 下面是我的代码,它从右侧返回各种文章的标题:
import json
from bs4 import BeautifulSoup
import urllib
from urllib2 import urlopen
from urllib2 import HTTPError
from urllib2 import URLError
import requests
resp = urlopen("https://query.nytimes.com/svc/add/v1/sitesearch.json")
content = resp.read()
j = json.loads(content)
articles = j['response']['docs']
headlines = [ article['headline']['main'] for article in articles ]
for article in articles:
print article['headline']['main']
最佳答案
如果我理解正确的话,您可以通过更改 api 查询来获取这些文章,如下所示:
import requests
data_range = ['24hours', '7days', '30days', '365days']
news_feed = {}
with requests.Session() as s:
for rng in data_range:
news_feed[rng] = s.get('http://query.nytimes.com/svc/add/v1/sitesearch.json?begin_date={}ago&facet=true'.format(rng)).json()
并访问这样的值:
print(news_feed) #or print(news_feed['30days'])
编辑
要查询其他页面,您可以尝试以下操作:
import requests
data_range = ['7days']
news_feed = {}
news_list = []
page = 1
with requests.Session() as s:
for rng in data_range:
while page < 20: #this is limited to 120
news_list.append(s.get('http://query.nytimes.com/svc/add/v1/sitesearch.json?begin_date={}ago&page={}&facet=true'.format(rng, page)).json())
page += 1
news_feed[rng] = news_list
for new in news_feed['7days']:
print(new)
关于python - 如何使用 BeautifulSoup 从新闻网站检索不同类别,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49434230/