python - 如何修复 "TypeError: Cannot mix str and non-str arguments"?

标签 python string scrapy typeerror

我正在编写一些抓取代码并遇到上述错误。
我的代码如下。

# -*- coding: utf-8 -*-
import scrapy
from myproject.items import Headline


class NewsSpider(scrapy.Spider):
    name = 'IC'
    allowed_domains = ['kosoku.jp']
    start_urls = ['http://kosoku.jp/ic.php']

    def parse(self, response):
        """
        extract target urls and combine them with the main domain
        """
        for url in response.css('table a::attr("href")'):
            yield(scrapy.Request(response.urljoin(url), self.parse_topics))

    def parse_topics(self, response):
        """
        pick up necessary information
        """
        item=Headline()
        item["name"]=response.css("h2#page-name ::text").re(r'.*(インターチェンジ)')
        item["road"]=response.css("div.ic-basic-info-left div:last-of-type ::text").re(r'.*道$')
        yield item

当我在 shell 脚本上单独执行它们时,我可以获得正确的响应,但是一旦它进入程序并运行,它就不会发生。
    2017-11-27 18:26:17 [scrapy.core.scraper] ERROR: Spider error processing <GET http://kosoku.jp/ic.php> (referer: None)
Traceback (most recent call last):
  File "/Users/sonogi/envs/scrapy/lib/python3.5/site-packages/scrapy/utils/defer.py", line 102, in iter_errback
    yield next(it)
  File "/Users/sonogi/envs/scrapy/lib/python3.5/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
    for x in result:
  File "/Users/sonogi/envs/scrapy/lib/python3.5/site-packages/scrapy/spidermiddlewares/referer.py", line 339, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "/Users/sonogi/envs/scrapy/lib/python3.5/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/Users/sonogi/envs/scrapy/lib/python3.5/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/Users/sonogi/scraping/myproject/myproject/spiders/IC.py", line 16, in parse
    yield(scrapy.Request(response.urljoin(url), self.parse_topics))
  File "/Users/sonogi/envs/scrapy/lib/python3.5/site-packages/scrapy/http/response/text.py", line 82, in urljoin
    return urljoin(get_base_url(self), url)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/parse.py", line 424, in urljoin
    base, url, _coerce_result = _coerce_args(base, url)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/parse.py", line 120, in _coerce_args
    raise TypeError("Cannot mix str and non-str arguments")
TypeError: Cannot mix str and non-str arguments
2017-11-27 18:26:17 [scrapy.core.engine] INFO: Closing spider (finished)

我很困惑,并感谢任何人的帮助!

最佳答案

根据 Scrapy 文档,.css(selector)您正在使用的方法,返回 SelectorList实例。如果您想要网址的实际(unicode)字符串版本,请调用 extract()方法:

def parse(self, response):
    for url in response.css('table a::attr("href")').extract():
        yield(scrapy.Request(response.urljoin(url), self.parse_topics))

关于python - 如何修复 "TypeError: Cannot mix str and non-str arguments"?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47507835/

相关文章:

python - 在 Python3 上是否有 Ansible 的替代品

python - 是否有与 Perl 的 Data::Dumper 等效的 Python 用于检查数据结构?

python - 是否可以将使用 Tkinter 创建的动画保存在文件中?

string - 在 kotlin 中,空对象的 toString() 方法返回空字符串而不是 "null"的最佳方法是什么

python - 如何使用 response.css() 和 response.follow() 在 Scrapy 中对最后一页进行分页?

python - scrapy如何重复重复的请求

python - Google API OAuth - 尝试上传到 YouTube 时收到 401(未经授权)

java - 增加字符串的最后一个字母

python - 去除所有空格 EXCLUDING 制表符

python - pymongo DuplicateKeyError - 在 upsert 期间