python - 如何使用 BeautifulSoup 从新闻网站检索不同类别

我想从新闻网站检索不同的类别。我正在使用 BeautifulSoup 从右侧获取文章标题。如何循环到网站左侧可用的各种类别？我刚刚开始学习这种代码，却不太了解它的工作原理。任何帮助将不胜感激。这是我正在开发的网站。 http://query.nytimes.com/search/sitesearch/#/ */ 下面是我的代码，它从右侧返回各种文章的标题:

import json
from bs4 import BeautifulSoup
import urllib
from urllib2 import urlopen 
from urllib2 import HTTPError 
from urllib2 import URLError
import requests


resp = urlopen("https://query.nytimes.com/svc/add/v1/sitesearch.json")

content = resp.read()
j = json.loads(content)

articles = j['response']['docs']
headlines = [ article['headline']['main'] for article in articles ]
for article in articles:
    print article['headline']['main']

最佳答案

如果我理解正确的话，您可以通过更改 api 查询来获取这些文章，如下所示:

import requests

data_range = ['24hours', '7days', '30days', '365days']
news_feed = {}

with requests.Session() as s:

   for rng in data_range:
        news_feed[rng] = s.get('http://query.nytimes.com/svc/add/v1/sitesearch.json?begin_date={}ago&facet=true'.format(rng)).json()

并访问这样的值:

print(news_feed) #or print(news_feed['30days'])

编辑

要查询其他页面，您可以尝试以下操作:

import requests

data_range = ['7days']
news_feed = {}
news_list = []
page = 1

with requests.Session() as s:
   for rng in data_range:
        while page < 20: #this is limited to 120
            news_list.append(s.get('http://query.nytimes.com/svc/add/v1/sitesearch.json?begin_date={}ago&page={}&facet=true'.format(rng, page)).json())
            page += 1
        news_feed[rng] = news_list

for new in news_feed['7days']:
    print(new)

关于python - 如何使用 BeautifulSoup 从新闻网站检索不同类别，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49434230/

python - 如何使用 BeautifulSoup 从新闻网站检索不同类别

上一篇：python - 如果在值中找到嵌套字典中另一个键的值，则替换字典中的值

下一篇：python - 用 df.where 替换迭代？