我希望从原始链接得到重定向的链接,如下:
--> [重定向网址] https://m.hani.co.kr/arti/culture/book/1088588.html
为此,我编写了以下代码:
import requests
link = "https://news.google.com/rss/articles/CBMiNWh0dHBzOi8vd3d3LmhhbmkuY28ua3IvYXJ0aS9jdWx0dXJlL2Jvb2svMTA4ODU4OC5odG1s0gEA?oc=5"
r = requests.get(link)
print(r.url)
几乎链接都可以,但某些链接出现错误,如下所示。可能是什么原因?
link = "https://news.google.com/rss/articles/CBMiK2h0dHBzOi8vemRuZXQuY28ua3Ivdmlldy8_bm89MjAyMzA0MTYxMTA1NDnSAQA?oc=5"
我们如何修复获取重定向网址的代码?
--> [重定向网址] https://zdnet.co.kr/view/?no=20230416110549
最佳答案
尝试设置正确的 cookie/HTTP header ,然后使用 HTML 解析器解析链接的响应:
import requests
from bs4 import BeautifulSoup
url = 'https://news.google.com/rss/articles/CBMiK2h0dHBzOi8vemRuZXQuY28ua3Ivdmlldy8_bm89MjAyMzA0MTYxMTA1NDnSAQA?oc=5'
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36'
}
cookies = {'CONSENT': 'YES+cb.20220419-08-p0.cs+FX+111'}
r = requests.get(url, headers=headers, cookies=cookies)
soup = BeautifulSoup(r.text, 'html.parser')
print(soup.a['href'])
打印:
https://zdnet.co.kr/view/?no=20230416110549
关于python - 如何使用请求从谷歌新闻链接获得重定向链接?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/76063646/