python - 如何修复此类 ClientForm 错误?

标签 python mechanize clientform

从 Mechanize 导入浏览器 br = 浏览器() 页面 = br.open(' http://wow.interzet.ru/news.php?readmore=23 ') br.form = br.forms().next() 打印br.form 给我以下错误:

Traceback (most recent call last):
  File "C:\Users\roddik\Desktop\mech.py", line 6, in <module>
    br.form = br.forms().next()
  File "build\bdist.win32\egg\mechanize\_mechanize.py", line 426, in forms
  File "D:\py26\lib\site-package\mechanize-0.1.11-py2.6.egg\mechanize\_html.py", line 559, in forms
  File "D:\py26\lib\site-packages\mechanize-0.1.11-py2.6.egg\mechanize\_html.py", line 225, in forms
  File "D:\py26\lib\site-packages\clientform-0.2.10-py2.6.egg\ClientForm.py", line 967, in ParseResponseEx
  File "D:\py26\lib\site-packages\clientform-0.2.10-py2.6.egg\ClientForm.py", line 1100, in _ParseFileEx
  File "D:\py26\lib\site-packages\clientform-0.2.10-py2.6.egg\ClientForm.py", line 870, in feed
  File "D:\py26\lib\sgmllib.py", line 104, in feed
    self.goahead(0)
  File "D:\py26\lib\sgmllib.py", line 138, in goahead
    k = self.parse_starttag(i)
  File "D:\py26\lib\sgmllib.py", line 290, in parse_starttag
    self._convert_ref, attrvalue)
  File "D:\py26\lib\sgmllib.py", line 302, in _convert_ref
    return self.convert_charref(match.group(2)) or \
  File "D:\py26\lib\site-packages\clientform-0.2.10-py2.6.egg\ClientForm.py", line 850, in convert_charref
  File "D:\py26\lib\site-packages\clientform-0.2.10-py2.6.egg\ClientForm.py", line 244, in unescape_charref

ValueError: invalid literal for int() with base 10: 'e'

如何修复它?

编辑:

我已经这样解决了。可以吗?如果不是,如何替代?

import ClientForm
from mechanize import Browser

def myunescape_charref(data, encoding):
    if not str(data).isdigit(): return 0
    name, base = data, 10
    if name.startswith("x"):
        name, base= name[1:], 16
    uc = unichr(int(name, base))
    if encoding is None:
        return uc
    else:
        try:
            repl = uc.encode(encoding)
        except UnicodeError:
            repl = "&#%s;" % data
        return repl

ClientForm.unescape_charref = myunescape_charref

最佳答案

问题是由这样的网址引起的

http://wow.zet/forum/index.php?showtopic=1197&pid=30419&st=0&#entry30419

ClientForm 正在寻找 &# 后的整数

url中可以有#,但在html中应该进行转义 因为 &# 表示字符编码

关于python - 如何修复此类 ClientForm 错误?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/1522125/

相关文章:

Python Mechanise urllib2.URLError 违反协议(protocol) - https 处理?

ruby - 使用 Mechanize 抓取/提取数据

Python mechanize 不点击按钮

python - 如何使用 smtplib 在 Python 中验证电子邮件地址

python - pyenv 在 macOS 上给出 shopt 命令未找到错误

python - 他们如何在 Django 项目中的 python 控制台中运行这些命令?

python - 如何让 py.test 测试接受交互式输入?

python - 如何在 Python 中制作一个以逗号结尾的列表?

Python - Mechanize 。 CSRF token /"referer header "有问题