python - xpath不能只选择一个html标签

标签 python python-3.x xpath web-scraping scrapy

我正在尝试从网站获取一些数据，但是当我使用以下代码时，它会返回所有匹配的元素，我只想返回第一个匹配项!我尝试过 extract_first 但它没有返回!

# -*- coding: utf-8 -*-
import scrapy
from gumtree.items import GumtreeItem



class FlatSpider(scrapy.Spider):
    name = "flat"
    allowed_domains = ["gumtree.com"]
    start_urls = (
        'https://www.gumtree.com/flats-for-sale',
    )

    def parse(self, response):
        item = GumtreeItem()
        item['title'] = response.xpath('//*[@class="listing-title"][1]/text()').extract()
        return item

如何使用 xpath 选择器仅选择一个元素？

最佳答案

这是因为第一个元素实际上是空的 - 仅过滤掉非空值并使用 extract_first() - 对我有用:

$ scrapy shell "https://www.gumtree.com/flats-for-sale" -s USER_AGENT="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.113 Safari/537.36"
In [1]: response.xpath('//*[@class="listing-title"][1]/text()[normalize-space(.)]').extract_first().strip()
Out[1]: u'REDUCED to sell! Stunning Hove sea view flat.'

关于python - xpath不能只选择一个html标签，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39574222/

上一篇：Python recv 多播返回一个字节

下一篇：python - 如何在 WTForms 中创建名为“from”的字段？

相关文章：

python - 展平任意深度的字典

python - 共享相同颜色条的绘图堆叠表面

Python 3.6 无法安装到 Linux 下的非标准目录

c# - 尝试使用 XPath 过滤 XML

python - 数据科学的 Elixir

python - 如何从情节中获取所有传说？

python-3.x - 从 Python 2 加载 Python 3 pickle

python - numpy `rint` 奇怪的行为

xml - 如何在列出的xml内容中使用xsl:for-each选择第二个节点？

xpath - 在短语中选择日期的正确 Xpath 子字符串是什么？