python - normalize-space 只适用于 xpath 而不是 css 选择器

我正在使用 scrapy 和 python 提取数据。

数据有时包含空格。我在 xpath 中使用 normalize-space 来删除这样的空格:

xpath('normalize-space(.//li[2]/strong/text())').extract()

这句话很好。但是，现在我想将 normalize-space 与 css 选择器一起使用。

我试过这个:

car['Location'] = site.css('normalize-space(div[class=location]::text)').extract()

我得到的结果是空的，但如果我删除了规范化空间我得到了正确的结果..

请问如何将它与 css 选择器一起使用？

def normalize_whitespace(str):
        import re
        str = str.strip()
        str = re.sub(r'\s+', ' ', str)
        return str

我这样称呼这个函数:

car['Location'] = normalize_whitespace(site.css('div[class=location]::text').extract())

但我得到的结果是空的。为什么请？

最佳答案

不幸的是，XPath 函数在 Scrapy 中不能用于 CSS 选择器。

您可以先将 div[class=location]::text CSS 选择器转换为等效的 XPath 表达式，然后将其包装在 normalize-space() 中作为输入到 .xpath()。

无论如何，由于您只对最终的“空白规范化”字符串感兴趣，您可以在 CSS 选择器提取的输出上使用 Python 函数实现相同的目的。

def normalize_whitespace(str):
    import re
    str = str.strip()
    str = re.sub(r'\s+', ' ', str)
    return str

如果你在 Scrapy 元素的某处包含这个函数，你可以像这样使用它:

    car['Location'] = normalize_whitespace(
        u''.join(site.css('div[class=location]::text').extract()))

或

    car['Location'] = normalize_whitespace(
        site.css('div[class=location]::text').extract()[0])

关于python - normalize-space 只适用于 xpath 而不是 css 选择器，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/21118582/