python - 来自同一页面中多个链接的同一项目中的scrapy数据？

如何使用 scrapy 数据从同一页面中的多个链接解析到同一项目？我不想仅保存子页面链接的数据，而是相当于 this 的数据例如，但不同之处在于我想关注同一页面上的多个链接:

A
----> B.1
--------> B.1.1
----> B.2 
--------> B.2.2

综上所述，我想在根页面A、B.1、B.1.1、B.2和B.2.2的同一项目中保存不同类型的数据。有人能给我一个玩具示例吗？

编辑:

对以下代码进行成像:

class MySpider(BaseSpider):

    start_urls = [www.pageA.com]

    def parse(self, response):
        myitem = Item()

        # some initial data
        item['a'] =  response.xpath(...).extract()

        # extract all B.i links
        url_Bi_list = response.xpath(...).extract()

        for url_Bi in url_Bi_list:
            yield Request(url_Bi,
                  ...
                  callback=parseBi,meta=dict(item=myitem))

    def parseBi(self, response):
        my_new_item = response.meta['item']


        # some second data
        item['bi'] =  response.xpath(...).extract()

        # extract B.i.i link
        url_Bii = response.xpath(...).extract()

        yield Request(url_Bii,
                  ...
                  callback=parseBii,meta=dict(item=my_new_item))

    def parseBii(self, response):
        final_item = response.meta['item']

        # extract more data from B.i.i link
        # some third inner data
        my_new_item['bii'] =  response.xpath(...).extract()

        yield final_item

那么，这个代码结构可以工作吗？我不确定何时产出元素或请求...

最佳答案

要查询多个页面，请使用示例 you just pointed (控制回调)，并使用 meta parameter要以 dict 形式在回调之间传递信息，请将一个项目传递给每个回调，以便稍后在最后一个回调中返回它。

def parseA(self, response):
    ...
    myitem = MyItem()
    # populate the item
    ...
    yield Request(url=<B url>,
                  ...
                  callback=parseB, meta=dict(item=myitem))

def parseB(self, response):
    my_new_item = response.meta['item']
    ...
    yield Request(url=<C url>,
                  ...
                  callback=parseC, meta=dict(item=my_new_item))

def parseC(self, response):
    final_item = response.meta['item']
    ...
    yield final_item

关于python - 来自同一页面中多个链接的同一项目中的scrapy数据？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/36947822/

python - 来自同一页面中多个链接的同一项目中的scrapy数据？

上一篇：带有lookbehind和lookahead的Python正则表达式不起作用

下一篇：python - 编写包含字典列表的 .csv 文件