python - 限制scrapy可以收集的元素数量

我正在使用 scrapy 来收集一些数据。我的 scrapy 程序在一次 session 中收集 100 个元素。我需要将其限制为 50 或任何随机数。我怎样才能做到这一点？欢迎任何解决方案。提前致谢

# -*- coding: utf-8 -*-
import re
import scrapy


class DmozItem(scrapy.Item):
    # define the fields for your item here like:
    link = scrapy.Field()
    attr = scrapy.Field()
    title = scrapy.Field()
    tag = scrapy.Field()


class DmozSpider(scrapy.Spider):
    name = "dmoz"
    allowed_domains = ["raleigh.craigslist.org"]
    start_urls = [
        "http://raleigh.craigslist.org/search/bab"
    ]

    BASE_URL = 'http://raleigh.craigslist.org/'

    def parse(self, response):
        links = response.xpath('//a[@class="hdrlnk"]/@href').extract()
        for link in links:
            absolute_url = self.BASE_URL + link
            yield scrapy.Request(absolute_url, callback=self.parse_attr)

    def parse_attr(self, response):
        match = re.search(r"(\w+)\.html", response.url)
        if match:
            item_id = match.group(1)
            url = self.BASE_URL + "reply/ral/bab/" + item_id

            item = DmozItem()
            item["link"] = response.url
            item["title"] = "".join(response.xpath("//span[@class='postingtitletext']//text()").extract())
            item["tag"] = "".join(response.xpath("//p[@class='attrgroup']/span/b/text()").extract()[0])
            return scrapy.Request(url, meta={'item': item}, callback=self.parse_contact)

    def parse_contact(self, response):
        item = response.meta['item']
        item["attr"] = "".join(response.xpath("//div[@class='anonemail']//text()").extract())
        return item

最佳答案

这就是 CloseSpider extension和 CLOSESPIDER_ITEMCOUNT 设置的目的是:

An integer which specifies a number of items. If the spider scrapes more than that amount if items and those items are passed by the item pipeline, the spider will be closed with the reason closespider_itemcount. If zero (or non set), spiders won’t be closed by number of passed items.

关于python - 限制scrapy可以收集的元素数量，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/30941333/

python - 限制scrapy可以收集的元素数量

上一篇：Python Boto3 AWS 分段上传语法

下一篇：python - 安装 ipython qtconsole mac osx - 在系统上找不到 pyqt