python - 为什么我会收到此 Python 脚本的连接拒绝异常?

标签 python web-scraping beautifulsoup python-requests urllib

我正在编写一个 Python 脚本,使用请求模块从 azlyrics 中获取歌曲的歌词。这是我写的脚本:

import requests, re
from bs4 import BeautifulSoup as bs
url = "http://search.azlyrics.com/search.php"
payload = {'q' : 'shape of you'}
r = requests.get(url, params = payload)
soup = bs(r.text,"html.parser")
try:
    link = soup.find('a', {'href':re.compile('http://www.azlyrics.com/lyrics/edsheeran/shapeofyou.html')})['href']
    link = link.replace('http', 'https')
    print(link)
    raw_data = requests.get(link)
except Exception as e: 
    print(e)

但我得到一个异常(exception):

Max retries exceeded with url: /lyrics/edsheeran/shapeofyou.html (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7fbda00b37f0>: Failed to establish a new connection: [Errno 111] Connection refused',))

我在互联网上读到我可能试图发送太多请求。所以我让脚本休眠了一段时间:

import requests, re
from bs4 import BeautifulSoup as bs
from time import sleep
url = "http://search.azlyrics.com/search.php"
payload = {'q' : 'shape of you'}
r = requests.get(url, params = payload)
soup = bs(r.text,"html.parser")
try:
    link = soup.find('a', {'href':re.compile('http://www.azlyrics.com/lyrics/edsheeran/shapeofyou.html')})['href']
    link = link.replace('http', 'https')
    sleep(60)
    print(link)
    raw_data = requests.get(link)
except Exception as e: 
    print(e)

但运气不好!

所以我尝试了同样的 urllib.request

import requests, re
from bs4 import BeautifulSoup as bs
from time import sleep
from urllib.request import urlopen
url = "http://search.azlyrics.com/search.php"
payload = {'q' : 'shape of you'}
r = requests.get(url, params = payload)
soup = bs(r.text,"html.parser")
try:
    link = soup.find('a', {'href':re.compile('http://www.azlyrics.com/lyrics/edsheeran/shapeofyou.html')})['href']
    link = link.replace('http', 'https')
    sleep(60)
    print(link)
    raw_data = urlopen(link).read()
except Exception as e: 
    print(e)

但随后得到了不同的异常说明:

<urlopen error [Errno 111] Connection refused>

谁能告诉我它出了什么问题以及如何解决它?

最佳答案

在您的网络浏览器中尝试一下;当您尝试访问http://www.azlyrics.com/lyrics/edsheeran/shapeofyou.html时它会工作得很好,但是当您尝试访问 https://www.azlyrics.com/lyrics/edsheeran/shapeofyou.html 时这是行不通的。

因此,请删除 link = link.replace('http', 'https') 行并重试。

关于python - 为什么我会收到此 Python 脚本的连接拒绝异常?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43630070/

相关文章:

python - 在 python 中网页抓取 text()

python - 可以使用 exec 运行异步功能吗?

python - 获取代理ip地址scrapy用来爬取

python - Python OrderedSet.issuperset() 中的意外行为

r - 使用 rvest 进行网络抓取 : filtering through paginanation

python - 如何使用 bs4 从网站获取表数据

javascript - Python 在警报中单击按钮

Python、BeautifulSoup - <div> 文本和 <img> 属性顺序正确

python - 将一列与包含分类值的多列进行比较,无需循环

python - 如何使用 MultiOutputRegressor 包装器为 XGBoost 网格搜索参数