python - Mechanize 选择第一个表单返回 "ImportError: No module named html5lib"

标签 python beautifulsoup mechanize html5lib

阅读后this tutorial ,我想出了这个代码,

import requests
   from bs4 import BeautifulSoup
   import re
   import mechanize
   import cookielib
   
   # Browser
   br = mechanize.Browser()
   
  # Cookie Jar
  cj = cookielib.LWPCookieJar()
  br.set_cookiejar(cj)
  
  # Browser options
  br.set_handle_equiv(True)
  br.set_handle_gzip(True)
  br.set_handle_redirect(True)
  br.set_handle_referer(True)
  br.set_handle_robots(False)
  
  # Follows refresh 0 but not hangs on refresh > 0
  br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
  
  # User-Agent (this is cheating, ok?)
  br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
  
  # The site we will navigate into, handling it's session
  br.open('http://www.cleanmetrics.net/foodcarbonscope')
  
  br.select_form(nr=0)
  br.form['ctl00$ContentPlaceHolder1$userName'] = "XXXXX"
  br.form['ctl00$ContentPlaceHolder1$passWord'] = "XXXXXX"
  
  # Login
  br.submit()
不断收到此错误:
File "scrapeRecipe.py", line 30, in <module>
    br.select_form(nr=0)
  File "build/bdist.macosx-10.11-intel/egg/mechanize/_mechanize.py", line 619, in select_form
  File "build/bdist.macosx-10.11-intel/egg/mechanize/_html.py", line 260, in global_form
  File "build/bdist.macosx-10.11-intel/egg/mechanize/_html.py", line 267, in forms
  File "build/bdist.macosx-10.11-intel/egg/mechanize/_html.py", line 282, in _get_forms
  File "build/bdist.macosx-10.11-intel/egg/mechanize/_html.py", line 247, in root
  File "build/bdist.macosx-10.11-intel/egg/mechanize/_html.py", line 145, in content_parser
ImportError: No module named html5lib
但是,我知道我已经成功安装了 html5lib 因为当我运行时 pip3 freeze我懂了
html5lib==0.999999999
six==1.10.0
webencodings==0.5.1
最新:
我认为我的问题可能与我的 easy-install.pth 文件有关。在我的站点包目录中,我实际上没有看到 html5lib。我只有这个:
BeautifulSoup-3.2.1-py2.7.egg
appdirs-1.4.3.dist-info
appdirs.py
appdirs.pyc
beautifulsoup4-4.5.3.dist-info
bs4
easy-install.pth
html2text-2016.9.19-py2.7.egg
mechanize-0.3.1-py2.7.egg
packaging
packaging-16.8.dist-info
pip-9.0.1-py2.7.egg
requests-2.13.0-py2.7.egg
当我跑 easy_install html5lib , 我得到 Adding html5lib 0.999999999 to easy-install.pth file .然而,在它成功完成对 html5lib 的依赖处理后,我打开了我的 easy_install.pth 文件,但我没有看到任何地方提到 html5lib?
   import sys; sys.__plen = len(sys.path)
   ./BeautifulSoup-3.2.1-py2.7.egg
   ./html2text-2016.9.19-py2.7.egg
   ./mechanize-0.3.1-py2.7.egg
   ./requests-2.13.0-py2.7.egg
   ./pip-9.0.1-py2.7.egg
   import sys; new=sys.path[sys.__plen:]; del sys.path[sys.__plen:]; p=getattr(sys,'__egginsert',0); sys.path[p:p]=new; sys.__egginsert = p+l    en(new)
除非 html5lib 在上述软件包之一中?我想知道是否需要在我的 python 代码中导入 html5lib 并列出根路径?
真的不知道为什么这会被否决? :/

最佳答案

我现在遇到了一个不同的问题,但这是 html5lib 的解决方案。

pip install --ignore-installed six --user
sudo -H pip install html5lib --ignore-installed

要了解更多信息,这是一个很好的主题:https://github.com/pypa/pip/issues/3165

关于python - Mechanize 选择第一个表单返回 "ImportError: No module named html5lib",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43646368/

相关文章:

python - pandas - 从最后一位数字生成下一个最接近的年份

python - 在不安装Mysql的情况下使用python连接MySQL

html - 使用 BeautifulSoup 获取链接的标题

javascript - Python Mechanize - 选择一个值并提交不起作用

python - 如何检测给定图像中的所有矩形框

python - 在ConfigParser中,为什么以 `REM `开头的行会被忽略?

python - 从 next_sibling 获取文本 - BeautifulSoup 4

python - 获取文本并删除所有标签,但保留标题和粗体的标签

python mechanize forms() 错误

ruby - Mechanize 如何在我手动结束脚本之前保持 session ?