python - Scrapy 爬取 LD JSON 数据

标签 python scrapy

我已经做了一些研究,但似乎找不到任何关于是否可以从 URL 中抓取 JSON 架构数据之类的信息。无论如何,我刚刚在查看产品时发现的一个示例是:

https://www.reevoo.com/p/panasonic-nn-e271wmbpq

    <script class="microdata-snippet" type="application/ld+json">
{
  "@context": "http://schema.org/",
  "@type": "Product",
  "name": "PANASONIC NN-E271WMBPQ",
  "image": "https://images.reevoo.com/products/3530/3530797/550x550.jpg?fingerprint=73ed91807dac7eb8f899757a348c735446d0a1fe&gravity=Center"


    ,"category": {
      "@type": "Thing",
      "name": "Microwave",
      "url": "https://www.reevoo.com/browse/product_type/microwaves"
    }



    ,"description": "Auto weight programs will automatically calculate the cooking time, once the weight has been entered. Acrylic lining makes cleaning easy, simply wipe after use. Child lock provides extra security to prevent little fingers interfering with the programming of the oven. \nAll our compact microwave ovens are packed with flexible features to make everyday cooking simple. Auto weight programs will automatically calculate the cooking time, once the weight has been entered. Acrylic lining makes cleaning easy, simply wipe after use. Child lock provides extra security to prevent little fingers interfering with the programming of the oven."



    ,"aggregateRating": {
      "@type": "AggregateRating",
      "ratingValue": "8.7",
      "ratingCount": 636,
      "worstRating": "1",
      "bestRating": "10"
    }

}
</script>

那么是否可以提取评分数据?

提前致谢

最佳答案

import json

接下来在您的代码中:

microdata_content = response.xpath('//script[@type="application/ld+json"]/text()').extract_first()
microdata = json.loads(microdata_content)

ratingValue = microdata["aggregateRating"]["ratingValue"]

关于python - Scrapy 爬取 LD JSON 数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51232247/

相关文章:

python - 如何使用 Scrapy 从网站获取所有纯文本?

python - scrapy中没有xpath结果

在发送 spider_closed 信号之前调用 Python Scrapy 函数?

python - 无法让 Scrapy 跟踪链接

python从多个文件中删除相似的字符串

python - 从Python中的文本文件中删除空白位置(x,y)?

python - 代码 : Code isn't working to sort through a list of 1 million integers, 打印前 10

python - 如何使用 QTableWidget PageUp/PageDown 表格?

python - Scrapy + selenium 对每个 url 请求两次

python - 子手无效语法错误