python-3.x - 如何使用带有 urllib 的 urlopen 修复 Python 3 中的 HTTP 错误

标签 python-3.x web-scraping urllib

我在标题中添加了一个用户代理。以下是我的代码和报错

from urllib.request import Request, urlopen
import json
from bs4 import BeautifulSoup
import time

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1)'}

domain=Request("http://online-courses.club/baugasm-series-8-design-abstract-textures-and-poster-with-acrylic-paint-photoshop-and-cinema-4d/",data=bytes(json.dumps(headers), encoding="utf-8"))
response =urlopen(domain)

我也尝试了不同的版本,注意域变量的变化

from urllib.request import Request, urlopen
import json
from bs4 import BeautifulSoup
import time

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1)'}

domain=Request("http://online-courses.club/baugasm-series-8-design-abstract-textures-and-poster-with-acrylic-paint-photoshop-and-cinema-4d/",headers)
response =urlopen(domain)

这些代码都不起作用。 错误:

line 9, in <module>
    response =urlopen(domain)
  File "C:\Users\ABC\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Users\ABC\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 531, in open
    response = meth(req, response)
  File "C:\Users\ABC\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Users\ABC\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 569, in error
    return self._call_chain(*args)
  File "C:\Users\ABC\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)
  File "C:\Users\ABC\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

最佳答案

使用 .add_header() 添加正确的 User-Agent

例如:

from urllib.request import Request, urlopen

domain=Request("http://online-courses.club/baugasm-series-8-design-abstract-textures-and-poster-with-acrylic-paint-photoshop-and-cinema-4d/")
domain.add_header('User-Agent', 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0')
response =urlopen(domain)

print(response.read())

打印:

b'<!DOCTYPE html>\r\n<html lang="en-US" prefix="og: http://ogp.me/ns#">\r\n<head itemscope="itemscope" itemtype="http://schema.org/WebSite">\r\n\t<meta charset="UTF-8" />

... and so on.

关于python-3.x - 如何使用带有 urllib 的 urlopen 修复 Python 3 中的 HTTP 错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62733630/

相关文章:

python-3.x - ModuleNotFoundError : No module named 'cassandra'

html - Python 和 xpath : identify html tags with spaced attributes

python - urllib 无法处理 api 请求

python-3.x - 上传嵌入图像?

python-3.x - 如何使用 Windows 调度程序在启动时自动运行 jupyter 笔记本(使用 Anaconda)

python - ORA-01722 : invalid number - Python with cx_Oracle

ruby - 如何将 Base64 图像发送到 Ruby 中的 Google Cloud Vision API 标签检测?

python - 刮痧确实有美汤

python - For循环调用urllib.urlopen().getcode()很慢

python - Urllib 算作网页点击吗?