python-3.x - Scrapy Extract 方法产生 Cannot mix str and non-str arguments 错误

我现在正在学习 scrappy，正在构建一个简单的房地产网站抓取工具。使用此代码，我试图抓取特定城市的房地产列表的所有 URL。我的代码遇到了以下错误 - “无法混合 str 和非 str 参数”。

我相信我已经将我的问题隔离到我的部分代码中

props = response.xpath('//div[@class = "address ellipsis"]/a/@href').extract()

如果我在 props xpath 赋值中使用 extract_first() 函数而不是 extract 函数，代码就可以正常工作。它获取每个页面上属性的第一个链接。然而，这终究不是我想要的。如果我使用 extract_first() 方法，我相信在代码运行时我的 xpath 调用是正确的。

有人可以解释一下我在这里做错了什么吗？我在下面列出了我的完整代码

import scrapy
from scrapy.http import Request

class AdvancedSpider(scrapy.Spider):
    name = 'advanced'
    allowed_domains = ['www.realtor.com']
    start_urls = ['http://www.realtor.com/realestateandhomes-search/Houston_TX/']

def parse(self, response):
    props = response.xpath('//div[@class = "address ellipsis"]/a/@href').extract()

    for prop in props:
        absolute_url = response.urljoin(props)
        yield Request(absolute_url, callback=self.parse_props)

    next_page_url = response.xpath('//a[@class = "next"]/@href').extract_first()
    absolute_next_page_url = response.urljoin(next_page_url)
    yield scrapy.Request(absolute_next_page_url)



def parse_props(self, response):
    pass

如果我能澄清任何事情，请告诉我。

最佳答案

您正在将 props 字符串列表传递给 response.urljoin() 但意味着 prop :

for prop in props:
    absolute_url = response.urljoin(prop)

关于python-3.x - Scrapy Extract 方法产生 Cannot mix str and non-str arguments 错误，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54209812/

python-3.x - Scrapy Extract 方法产生 Cannot mix str and non-str arguments 错误

上一篇：asn.1 - 读BERTLV时，什么时候停止？

下一篇：mathjax 中的 html 标签