python - 在 Python 中从 https 站点获取 HTML 内容

标签 python ssl https

我想从站点获取 HTML 代码并将其写入文件。它适用于 http 站点,但如果有 SSL 链接,那么我会收到很多错误。知道如何处理它吗?

from __future__ import print_function
import io
import os
import re
import ssl
from urllib.request import urlopen

    with io.open('words.txt', 'a',encoding="utf-8") as g:
        url = "https://www.something.some"
        html = urlopen(url).read()
        print(html, file = g)

这里是错误

Traceback (most recent call last):
  File "...\Desktop\mined.py", line 54, in <module>
    html = urlopen(url).read()
  File "...\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 163, in urlopen
    return opener.open(url, data, timeout)
  File "....\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 472, in open
    response = meth(req, response)
  File "...\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 582, in http_response
    'http', request, response, code, msg, hdrs)
  File "...\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 510, in error
    return self._call_chain(*args)
  File "...\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 444, in _call_chain
    result = func(*args)
  File "...\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 590, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

最佳答案

我会这样做:

import urllib

resp = urllib.urlopen('https://somewebsite.com') # open url
page = resp.read()                               # copy website source to 'page' variable
text_file = open("Output.txt", "w")              # open txt file
text_file.write(page)                            # insert website source into txt file
text_file.close()                                # close file

关于python - 在 Python 中从 https 站点获取 HTML 内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40241718/

相关文章:

python - 合并和填充 Pandas DataFrames

java - 从智能卡写入和读取证书

java使用httpclient 4.1抓取https获取bad_record_mac

ssl - 配置 Nginx 将客户端证书转发到后端

ssl - 您可以为带有 ssl 证书的子域设置别名吗

python - 使用python从字符串中删除括号之间的所有内容

python - 在 Pandas 中查找符合标准的比例最高的类别

python - OpenCV:选择颜色过滤的 HSV 阈值

html - iFrame src 中的 HSTS

wordpress - 为什么 Chrome 将 https 添加到我的网站 Assets 、safari 和 firefox 工作正常