python - 使用 python mechanize 登录具有 NTLM 身份验证的页面

标签 python authentication mechanize

我想使用 mechanize 登录页面并检索一些信息。但是我尝试验证它只是失败,错误代码为 HTTP 401,如下所示:

r = br.open('http://intra')
File "bui...e\_mechanize.py", line 203, in open
File "bui...g\mechanize\_mechanize.py", line 255,
in _mech_openmechanize._response.httperror_seek_wrapper: HTTP Error 401: Unauthorized

到目前为止,这是我的代码:

import mechanize
import cookielib

# Browser
br = mechanize.Browser()

# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)

# Browser options
br.set_handle_equiv(True)
# br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)

# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)

# If the protected site didn't receive the authentication data you would
# end up with a 410 error in your face
br.add_password('http://intra', 'myusername', 'mypassword')

# User-Agent (this is cheating, ok?)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
# Open some site, let's pick a random one, the first that pops in mind:
# r = br.open('http://google.com')
r = br.open('http://intra')
html = r.read()

# Show the source
print html

我做错了什么?使用例如访问 http://intra(内部页面) chrome,它弹出一个窗口并要求输入一次用户名/密码,然后一切正常。

弹出的对话框如下所示:

enter image description here

最佳答案

经过大量研究,我设法找出了这背后的原因。

查找所有站点都使用所谓的 NTLM authentication , Mechanize 不支持。 这有助于找出站点的身份验证机制:

wget -O /dev/null -S http://www.the-site.com/

所以代码稍微修改了一下:

import sys
import urllib2
import mechanize
from ntlm import HTTPNtlmAuthHandler

print("LOGIN...")
user = sys.argv[1]
password = sys.argv[2]
url = sys.argv[3]

passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, url, user, password)
# create the NTLM authentication handler
auth_NTLM = HTTPNtlmAuthHandler.HTTPNtlmAuthHandler(passman)

browser = mechanize.Browser()
handlersToKeep = []

for handler in browser.handlers:
    if not isinstance(handler,
    (mechanize._http.HTTPRobotRulesProcessor)):
        handlersToKeep.append(handler)

browser.handlers = handlersToKeep
browser.add_handler(auth_NTLM)

response = browser.open(url)
response = browser.open("http://www.the-site.com")
print(response.read())

最后 Mechanize 需要打补丁,如前所述here :

--- _response.py.old    2013-02-06 11:14:33.208385467 +0100
+++ _response.py    2013-02-06 11:21:41.884081708 +0100
@@ -350,8 +350,13 @@
             self.fileno = self.fp.fileno
         else:
             self.fileno = lambda: None
-        self.__iter__ = self.fp.__iter__
-        self.next = self.fp.next
+
+        if hasattr(self.fp, "__iter__"):
+            self.__iter__ = self.fp.__iter__
+            self.next = self.fp.next
+        else:
+            self.__iter__ = lambda self: self
+            self.next = lambda self: self.fp.readline()

     def __repr__(self):
         return '<%s at %s whose fp = %r>' % (

关于python - 使用 python mechanize 登录具有 NTLM 身份验证的页面,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22224840/

相关文章:

angular - 使用 API 的 Ionic 3 登录身份验证 - 无法读取 null 的属性 'json'

angularjs - 如何用 Angular 模拟curl命令

ruby-on-rails-3 - 轨道 3 : Choose and run a Mechanize script from inside Rails action.

ruby - 从 ruby​​ 站点检索帖子数据

python - 从源代码安装 Python 3.6.3 后 lsb_release 不工作

python - Pandas 在连接后对 MultiIndex 进行排序

java - 用于移动客户端的 RESTful Java Web 服务中的身份验证

ruby - 使用 Mechanize

python - TensorFlow 在 session 启动时分配大量主内存

python - 如何通过选择特定时间间隔内的时间来索引 pandas DataFrames?