python - Scrapy response.replace编码错误

标签 python scrapy

我正在尝试使用 response.replace() 替换 google 搜索结果页面的搜索结果 block 的响应正文,但我遇到了一些编码问题。

scrapy  shell "http://www.google.de/search?q=Zuckerccc"

>>> srb = hxs.select("//li[@class='g']").extract()
>>> body = '<html><body>' + srb[0] + '</body></html>'    # get only 1st search result block
>>> b = response.replace(body = body)
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "scrapy/lib/python2.6/site-packages/scrapy/http/response/text.py", line 54, in replace
    return Response.replace(self, *args, **kwargs)
  File "scrapy/lib/python2.6/site-packages/scrapy/http/response/__init__.py", line 77, in replace
    return cls(*args, **kwargs)
  File "scrapy/lib/python2.6/site-packages/scrapy/http/response/text.py", line 31, in __init__
    super(TextResponse, self).__init__(*args, **kwargs)
  File "scrapy/lib/python2.6/site-packages/scrapy/http/response/__init__.py", line 19, in __init__
    self._set_body(body)
  File "scrapy/lib/python2.6/site-packages/scrapy/http/response/text.py", line 48, in _set_body
    self._body = body.encode(self._encoding)
  File "../local_1/Linux-2.6c2.5-x86_64/Python/Python-147.0-0/lib/python2.6/encodings/cp1252.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u0131' in position 529: character maps to <undefined>

我也尝试创建自己的回复,

>>> x = HtmlResponse("http://www.google.de/search?q=Zuckerccc", body = body, encoding = response.encoding)
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "scrapy/lib/python2.6/site-packages/scrapy/http/response/text.py", line 31, in __init__
    super(TextResponse, self).__init__(*args, **kwargs)
    self._set_body(body)
  File "scrapy/lib/python2.6/site-packages/scrapy/http/response/text.py", line 48, in _set_body
    self._body = body.encode(self._encoding)
  File "../local_1/Linux-2.6c2.5-x86_64/Python/Python-147.0-0/lib/python2.6/encodings/cp1252.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u0131' in position 529: character maps to <undefined>
  File "scrapy/lib/python2.6/site-packages/scrapy/http/response/__init__.py", line 19, in __init__

此外,当我在 replace() 函数中使用 _body_declared_encoding() 进行编码时,它起作用了。

replace(body = body, encoding = response._body_declared_encoding())

我不明白为什么 response._body_declared_encoding() 和 response.encoding 不同。任何人都可以阐明这一点。

那么,解决这个问题的好方法是什么?

最佳答案

我成功地用这些代码行替换了响应主体:

scrapy  shell "http://www.google.de/search?q=Zuckerccc"
>>> google_result = response.xpath('//li[@class="g"]').extract()[0]
>>> body = '<html><body>' + google_result + '</body></html>'
>>> b = response.replace(body = body)

关于python - Scrapy response.replace编码错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17937159/

相关文章:

python - 计算 ndarray 中的唯一切片数

python - 无法理解递归

python - import tensorflow 在 jupyter notebook 中有效,但在命令行中无效。使用 conda 安装 tensorflow 和 jupyter notebook

scrapy - 在 scrapy scrapes 之间保存 cookie

python - Scrapy抛出属性错误

beautifulsoup - python抓取错误AttributeError : 'NoneType' object has no attribute 'text'

php - 如何在 linux debian 上使用来自 php 的 url 参数运行 scrapy

python - Scrapy合并到1个列表

python - 绘制用 sklearn 制作的回归量的 3d 图

python - 如何将 .csv 文件中的列存储到列表中?