python - Scrapy( python )TypeError : unhashable type: 'list'

标签 python scrapy

我有这个简单的代码。但是,当我使用 response.urljoin(port_homepage_url) 这部分代码时出现此错误。

import re

import scrapy
from vesseltracker.items import VesseltrackerItem


class GetVessel(scrapy.Spider):
    name = "getvessel"
    allowed_domains = ["marinetraffic.com"]
    start_urls = [
        'http://www.marinetraffic.com/en/ais/index/ports/all/flag:AE',
    ]

def parse(self, response):
    item = VesseltrackerItem()
    for ports in response.xpath('//table/tr[position()>1]'):
        item['port_name'] = ports.xpath('td[2]/a/text()').extract()
        port_homepage_url = ports.xpath('td[7]/a/@href').extract()
        port_homepage_url = response.urljoin(port_homepage_url)
        yield scrapy.Request(port_homepage_url, callback=self.parse, meta={'item': item})

有什么问题吗?

这是错误日志。

2016-09-30 17:17:13 [scrapy] DEBUG: Crawled (200) <GET http://www.marinetraffic.com/robots.txt> (referer: None)
2016-09-30 17:17:14 [scrapy] DEBUG: Crawled (200) <GET http://www.marinetraffic.com/en/ais/index/ports/all/flag:AE> (referer: None)
2016-09-30 17:17:14 [scrapy] ERROR: Spider error processing <GET http://www.marinetraffic.com/en/ais/index/ports/all/flag:AE> (referer: None)
Traceback (most recent call last):
  File "/Users/noussh/python/env/lib/python2.7/site-packages/scrapy/utils/defer.py", line 102, in iter_errback
    yield next(it)
  File "/Users/noussh/python/env/lib/python2.7/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
    for x in result:
  File "/Users/noussh/python/env/lib/python2.7/site-packages/scrapy/spidermiddlewares/referer.py", line 22, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "/Users/noussh/python/env/lib/python2.7/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/Users/noussh/python/env/lib/python2.7/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/Users/noussh/python/vesseltracker/vesseltracker/spiders/marinetraffic.py", line 19, in parse
    port_homepage_url = response.urljoin(port_homepage_url)
  File "/Users/noussh/python/env/lib/python2.7/site-packages/scrapy/http/response/text.py", line 78, in urljoin
    return urljoin(get_base_url(self), url)
  File "/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urlparse.py", line 261, in urljoin
    urlparse(url, bscheme, allow_fragments)
  File "/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urlparse.py", line 143, in urlparse
    tuple = urlsplit(url, scheme, allow_fragments)
  File "/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urlparse.py", line 176, in urlsplit
    cached = _parse_cache.get(key, None)
TypeError: unhashable type: 'list'

最佳答案

ports.xpath('td[7]/a/@href').extract() 返回一个列表,当您尝试执行“urljoin”时“在上面,它失败了。使用 extract_first() 代替:

port_homepage_url = ports.xpath('td[7]/a/@href').extract_first()

关于python - Scrapy( python )TypeError : unhashable type: 'list' ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39792600/

相关文章:

python - scrapy 抓取一组可能包含下一页的链接

python - Pandas :noauth_local_webserver

python-2.7 - 如何解决scrapy中的403错误

python - 尝试从网站上抓取电子邮件地址

python - 使用 TwitterAPI Python 库,当您进行搜索时可以按语言进行过滤吗?

python - Scrapy Contracts - 延迟中未处理的错误

python - 在 scrapy 中使用登录表单

python - 等同于 python 日期时间

python - numpy.concatenate 对单个参数有什么作用?

python - 如何从 Python 在 Odoo-8 中执行查询?