python - Scrapy - TypeError : Cannot convert unicode body - HtmlResponse has no encoding

标签 python python-2.7 encoding scrapy

当我尝试像这样在 Scrapy 中构造一个 HtmlResponse 对象时:

scrapy.http.HtmlResponse(url=self.base_url + dealer_url[0], body=dealer_html)

我遇到了这个错误:

Traceback (most recent call last):

  File "d:\kerja\hit\python~1\<project_name>\<project_name>\lib\site-packages\twisted\internet\defer.py", line 588, in _runCallbacks

    current.result = callback(current.result, *args, **kw)

  File "D:\Kerja\HIT\Python Projects\<project_name>\<project_name>\<project_name>\<project_name>\spiders\fwi.py", line 69, in parse_items

    dealer_page = scrapy.http.HtmlResponse(url=self.base_url + dealer_url[0], body=dealer_html)

  File "d:\kerja\hit\python~1\<project_name>\<project_name>\lib\site-packages\scrapy\http\response\text.py", line 27, in __init__

    super(TextResponse, self).__init__(*args, **kwargs)

  File "d:\kerja\hit\python~1\<project_name>\<project_name>\lib\site-packages\scrapy\http\response\__init__.py", line 18, in __init__

    self._set_body(body)

  File "d:\kerja\hit\python~1\<project_name>\<project_name>\lib\site-packages\scrapy\http\response\text.py", line 43, in _set_body

    type(self).__name__)

TypeError: Cannot convert unicode body - HtmlResponse has no encoding

有谁知道如何解决这个错误?

最佳答案

HtmlResponse正在尝试检测编码:

The HtmlResponse class is a subclass of TextResponse which adds encoding auto-discovering support by looking into the HTML meta http-equiv attribute. See TextResponse.encoding.

所以基本上,您提供给 body 参数的 html 字符串(在您的情况下为 dealer_html)没有指定编码。 根据 w3 docs of http-equiv它应该有:

HTML 4.01: <meta http-equiv="content-type" content="text/html; charset=UTF-8">
HTML5: <meta charset="UTF-8">

在这种情况下,您可以修复 html 或在通过 encoding 参数创建 HtmlResponse 对象时指定编码:

HtmlResponse(url='http://scrapy.org', body=u'some body', encoding='utf-8')

关于python - Scrapy - TypeError : Cannot convert unicode body - HtmlResponse has no encoding,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39237259/

相关文章:

python - 您必须使用 dtype float 为占位符张量 'Placeholder' 提供一个值

python - 在pygal中创建组合图表?

arrays - numpy 数组的最大大小是多少?

java - 字符串编码不输出所有字符

python - 如何根据 Python 中的分隔符从 pandas 数据框列中的值创建新行?

python - 操作系统错误 : [Errno 9] Bad file descriptor

python-2.7 - Python,多维数组

python 2.7 无法索引行但获取字母

html - MvcHtmlString.Create() 方法不返回 Html 编码的字符串

ruby - 如何删除 ruby 中的不可打印/不可见字符?