python - 证书过期,无法使用 certify=True; requests.exceptions.SSLError 证书验证失败

标签 python web-scraping beautifulsoup python-requests

我是 Python 的真正初学者,基本上所有内容都是从互联网上学到的 - 所以如果我可能没有正确掌握所有概念,请原谅。

我的问题是我尝试使用 requestsBeautifulSoup 编写网络抓取程序。两天以来,我收到证书已过期的错误,如果我输入 this website 也是如此。 - 我什至不能将其添加为我的资源管理器中的异常(exception)。

这是我的代码:

def project_spider(max_pages):
    global page
    page = 1
    #for i in range(1, max_pages+1):
    while page <= max_pages:
       # for i in range(1, page + 1)
            page += 1
            url = 'https://hubbub.org/projects/?page=' + str(page)
            # Collect list of urls
            try:
                source_code = requests.get(url, allow_redirects=False, timeout=15, verify=False)
            except Exception or AttributeError or ConnectionError or IOError:
                print 'Failed to open url.'
                pass
            # Turn urls to text
            plain_text = source_code.text.encode('utf-8')
            # define object with all text on website
            soup = BeautifulSoup(plain_text, 'html.parser')
            # define variable that finds in the text data everything that is in the html code considered "diverse" and has the attributes 'col...' class
            data = soup.findAll('div', attrs={'class': 'col-xs-12 col-sm-6 col-md-4 col-lg-3'})
            # for every found diverse in the data variable
            for div in data:
               #search all diverse for links (a)
               links = div.findAll('a', href=True)
               global names
               names = div.find('h4').contents[0]
               print(names)
               for a in links:
                   global links2
                   links2 = a['href']
                   print(links2)
                   get_single_item_data(links2)

专家可能会采用不同的编程方式。但是,我尝试使用 verify=False 和 session() 修复它,但它不起作用。我还尝试跳过(5)中的页面,但我无法跳过它。这一刻我真的很绝望,因为我得到的只是这个错误:

https://rabbitraisers.org/p/fantasticfloats/
Traceback (most recent call last):
  File "C:\Users\stockisa\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "C:\Users\stockisa\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py", line 343, in _make_request
    self._validate_conn(conn)
  File "C:\Users\stockisa\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py", line 849, in _validate_conn
    conn.connect()
  File "C:\Users\stockisa\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connection.py", line 356, in connect
    ssl_context=context)
  File "C:\Users\stockisa\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\util\ssl_.py", line 359, in ssl_wrap_socket
    return context.wrap_socket(sock, server_hostname=server_hostname)
  File "C:\Users\stockisa\AppData\Local\Programs\Python\Python37\lib\ssl.py", line 412, in wrap_socket
    session=session
  File "C:\Users\stockisa\AppData\Local\Programs\Python\Python37\lib\ssl.py", line 850, in _create
    self.do_handshake()
  File "C:\Users\stockisa\AppData\Local\Programs\Python\Python37\lib\ssl.py", line 1108, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1045)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\stockisa\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\adapters.py", line 445, in send
    timeout=timeout
  File "C:\Users\stockisa\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py", line 638, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "C:\Users\stockisa\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\util\retry.py", line 398, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='rabbitraisers.org', port=443): Max retries exceeded with url: /p/fantasticfloats/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1045)')))

最佳答案

在源代码顶部导入此内容

from requests.packages.urllib3.exceptions import InsecureRequestWarning

然后将其作为 project_spider 函数的第一行之一

requests.packages.urllib3.disable_warnings(InsecureRequestWarning)

关于python - 证书过期,无法使用 certify=True; requests.exceptions.SSLError 证书验证失败,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52896088/

相关文章:

python - 烦人的 python tesseract 错误 Error opening data file ./tessdata/eng.traineddata

python - 如何解释这种随机的 python 行为?

html - 使用 R (rvest) 将 html 文本抓取到带有没有清晰模式的分隔符的表中

javascript - Apify 数组结果显示在一行而不是单独的行中

python - 如何根据文本中的关键字将一个html页面拆分为多个html

python - 使用 BeautifulSoup 抓取网站时显示符号

java - 使用 JRuby 或 Python 调用一些第三方 Java 库 - 架构问题

python - Python中出现太多索引错误

python - 如何将这个字符串拆分成它的单个字符?

python - 使用 BeautifulSoup 进行多处理来改进 Wikipedia 抓取