python - “urllib.error.HTTPError: HTTP Error 404: Not Found” Python

标签 python http web error-handling urllib

我正在尝试使用urllib.request.open函数打开此网页:
https://prenotaonline.esteri.it/login.aspx?cidsede=100001&returnUrl=//

我可以使用常规浏览器访问此网页,仍然可以使用urrlib.request.open函数返回HTTP错误404:

import urllib.request


page = urllib.request.urlopen("https://prenotaonline.esteri.it/login.aspx?cidsede=100001&returnUrl=//").read()
print(page)

我收到以下错误:
Traceback (most recent call last):
  File "/Users/markmouawad/Documents/consu_programa/scrapper.py", line 4, in <module>
    page = urllib.request.urlopen("https://prenotaonline.esteri.it/login.aspx?cidsede=100001&returnUrl=//").read()
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 163, in urlopen
    return opener.open(url, data, timeout)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 472, in open
    response = meth(req, response)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 582, in http_response
    'http', request, response, code, msg, hdrs)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 510, in error
    return self._call_chain(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 444, in _call_chain
    result = func(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 590, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

我正在使用Python 3.5.3

最佳答案

这是制作蜘蛛/爬行机器人时首先发现的第一件事。

检测漫游器的基本方法是请求 header 是否包含User-Agent header 。

尝试以下代码片段:

import requests

headers = {'USER-AGENT': 'Mozilla/5.0 (iPad; U; CPU OS 3_2_1 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Mobile/7B405'}

r = requests.get(URL, headers=headers)

print r.status_code  # should be 200 
print r.content  # should hold page content

关于python - “urllib.error.HTTPError: HTTP Error 404: Not Found” Python,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48489443/

相关文章:

python - 对于文本文件中的每个单词,提取周围的 5 个单词

Python SMTP 错误代码处理

php - $http - Angular post 参数并从 PHP 获取 json

java - 通过预先存在的 Socket 路由 Http(s)URLConnection

javascript - HTML/Javascript 页面转换

web - 如何从beego AppConfig获取值并在HTML模板中渲染?

python生成xml

python - 如何在python中使用matplotlib制作空心方形标记

perl - 是否有与 PHP 的 ignore_user_abort() 等效的 mod_perl2/Perl 5?

javascript - 未捕获的语法错误 : Unexpected token < on line 1