python - Scrapy合并到1个列表

我已经构建了我的第一个 Scrapy 项目，但无法弄清楚最后一个障碍。通过下面的脚本，我得到了一份 csv 格式的长列表。首先是所有产品价格，然后是所有产品名称。

我想要实现的是，每件产品的价格都接近。例如:

Product Name, Product Price
Product Name, Product Price

我的scrapy项目:

项目.py

from scrapy.item import Item, Field


class PrijsvergelijkingItem(Item):
    Product_ref = Field()
    Product_price = Field()

我的蜘蛛名为 nvdb.py:

from scrapy.spider import BaseSpider
import scrapy.selector
from Prijsvergelijking.items import PrijsvergelijkingItem

class MySpider(BaseSpider):

name = "nvdb"
allowed_domains = ["vandenborre.be"]
start_urls = ["http://www.vandenborre.be/tv-lcd-led/lcd-led-tv-80-cm-alle-producten"]

def parse(self, response):
    hxs = scrapy.Selector(response)
    titles = hxs.xpath("//ul[@id='prodlist_ul']")
    items = []
    for titles in titles:
        item = PrijsvergelijkingItem()
        item["Product_ref"] = titles.xpath("//div[@class='prod_naam']//text()[2]").extract()
        item["Product_price"] = titles.xpath("//div[@class='prijs']//text()[2]").extract()
        items.append(item)
    return items

最佳答案

您需要切换 XPath 表达式以在每个“产品”的上下文中工作。为此，您需要在表达式前面添加一个点:

def parse(self, response):
    products = response.xpath("//ul[@id='prodlist_ul']/li")
    for product in products:
        item = PrijsvergelijkingItem()
        item["Product_ref"] = product.xpath(".//div[@class='prod_naam']//text()[2]").extract_first()
        item["Product_price"] = product.xpath(".//div[@class='prijs']//text()[2]").extract_first()
        yield item

我还稍微改进了代码:

我假设您打算迭代列表项 ul->li 而不仅仅是 ul - 修复了表达式
使用了response.xpath()快捷方法
使用extract_first()而不是extract()
改进了变量命名
使用 yield 而不是收集列表中的项目然后返回

关于python - Scrapy合并到1个列表，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/36889243/

python - Scrapy合并到1个列表

上一篇：python - Win 7，IDLE突然不工作，Labview不工作，系统缓慢

下一篇：python - python中按行对数组进行排序