python - Scrapy>索引错误: list index out of range

标签 python xpath scrapy tripadvisor

我正在尝试抓取 TripAdvisor 的一些数据。 我有兴趣了解餐厅的“价格范围/菜肴和膳食”。

因此,我使用以下 xpath 在同一类中提取这 3 行中的每一行:

response.xpath('//div[@class="restaurants-detail-overview-cards-DetailsSectionOverviewCard__categoryTitle--14zKt"]/text()').extract()[1]

我直接在 scrapy shell 中进行测试,并且运行良好:

scrapy shell https://www.tripadvisor.com/Restaurant_Review-g187514-d15364769-Reviews-La_Gaditana_Castellana-Madrid.html

但是当我将其集成到我的脚本中时,出现以下错误:

    Traceback (most recent call last):
  File "/usr/lib64/python3.6/site-packages/scrapy/utils/defer.py", line 102, in iter_errback
    yield next(it)
  File "/usr/lib64/python3.6/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
    for x in result:
  File "/usr/lib64/python3.6/site-packages/scrapy/spidermiddlewares/referer.py", line 339, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "/usr/lib64/python3.6/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/usr/lib64/python3.6/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/root/Scrapy_TripAdvisor_Restaurant-master/tripadvisor_las_vegas/tripadvisor_las_vegas/spiders/res_las_vegas.py", line 64, in parse_listing
    (response.xpath('//div[@class="restaurants-details-card-TagCategories__categoryTitle--o3o2I"]/text()')[1])
  File "/usr/lib/python3.6/site-packages/parsel/selector.py", line 61, in __getitem__
    o = super(SelectorList, self).__getitem__(pos)
IndexError: list index out of range

我将部分代码粘贴给您,并在下面进行解释:

# extract restaurant cuisine
    row_cuisine_overviewcard = \
    (response.xpath('//div[@class="restaurants-detail-overview-cards-DetailsSectionOverviewCard__categoryTitle--14zKt"]/text()')[1])
    row_cuisine_card = \
    (response.xpath('//div[@class="restaurants-details-card-TagCategories__categoryTitle--o3o2I"]/text()')[1])
    
    
    if (row_cuisine_overviewcard == "CUISINES"):
        cuisine = \
        response.xpath('//div[@class="restaurants-detail-overview-cards-DetailsSectionOverviewCard__tagText--1XLfi"]/text()')[1]
    elif (row_cuisine_card == "CUISINES"):
        cuisine = \
        response.xpath('//div[@class="restaurants-details-card-TagCategories__tagText--2170b"]/text()')[1]
    else:
        cuisine = None

在 tripAdvisor 餐厅中,有 2 种不同类型的页面,有 2 种不同的格式。 第一个带有类(class)概述卡,第二个带有类(class)卡

所以我想检查第一个是否存在(overviewcard),如果不存在,则执行第二个(card),如果不存在,则输入“None”值。

:D 但看起来 Python 都执行了...并且由于页面中不存在第二个,因此脚本停止。

可能是缩进错误吗?

感谢您的帮助 问候

最佳答案

您的第二个选择器 (row_cuisine_card) 失败,因为页面上不存在该元素。当您尝试访问结果中的 [1] 时,它会抛出错误,因为结果数组为空。

假设您确实想要项目 1,请尝试此操作

row_cuisine_overviewcard = \
(response.xpath('//div[@class="restaurants-detail-overview-cards-DetailsSectionOverviewCard__categoryTitle--14zKt"]/text()')[1])
# Here we get all the values, even if it is empty.
row_cuisine_card = \
(response.xpath('//div[@class="restaurants-details-card-TagCategories__categoryTitle--o3o2I"]/text()').getall()) 


if (row_cuisine_overviewcard == "CUISINES"):
    cuisine = \
    response.xpath('//div[@class="restaurants-detail-overview-cards-DetailsSectionOverviewCard__tagText--1XLfi"]/text()')[1]
# Here we check first if that result has more than 1 item, and then we check the value.
elif (len(row_cuisine_card) > 1 and row_cuisine_card[1] == "CUISINES"):
    cuisine = \
    response.xpath('//div[@class="restaurants-details-card-TagCategories__tagText--2170b"]/text()')[1]
else:
    cuisine = None

每当您尝试从选择器获取特定索引时,都应该应用相同类型的安全检查。换句话说,在访问之前请确保您拥有一个值。

关于python - Scrapy>索引错误: list index out of range,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54633323/

相关文章:

xml - 测试节点是否为空

perl - 我的 Perl 脚本中的 findnodes 语句有什么问题?

xml - XSLT-搜索和查找特定标签

java - 从java调用时bash脚本不等待命令完成

amazon-s3 - 支持 S3 的 Scrapy

python - Cloudflare 碎片

python - 比较 a,b == 1,2 有什么问题?

python - sqlite3.ProgrammingError : Incorrect number of bindings supplied. 当前语句使用2,并且提供了0

Python 变量范围(通过引用或复制传递?)

python - 文件未找到错误: [WinError 3] The system cannot find the path specified: ''