我已经做了一些研究,但似乎找不到任何关于是否可以从 URL 中抓取 JSON 架构数据之类的信息。无论如何,我刚刚在查看产品时发现的一个示例是:
https://www.reevoo.com/p/panasonic-nn-e271wmbpq
<script class="microdata-snippet" type="application/ld+json">
{
"@context": "http://schema.org/",
"@type": "Product",
"name": "PANASONIC NN-E271WMBPQ",
"image": "https://images.reevoo.com/products/3530/3530797/550x550.jpg?fingerprint=73ed91807dac7eb8f899757a348c735446d0a1fe&gravity=Center"
,"category": {
"@type": "Thing",
"name": "Microwave",
"url": "https://www.reevoo.com/browse/product_type/microwaves"
}
,"description": "Auto weight programs will automatically calculate the cooking time, once the weight has been entered. Acrylic lining makes cleaning easy, simply wipe after use. Child lock provides extra security to prevent little fingers interfering with the programming of the oven. \nAll our compact microwave ovens are packed with flexible features to make everyday cooking simple. Auto weight programs will automatically calculate the cooking time, once the weight has been entered. Acrylic lining makes cleaning easy, simply wipe after use. Child lock provides extra security to prevent little fingers interfering with the programming of the oven."
,"aggregateRating": {
"@type": "AggregateRating",
"ratingValue": "8.7",
"ratingCount": 636,
"worstRating": "1",
"bestRating": "10"
}
}
</script>
那么是否可以提取评分数据?
提前致谢
最佳答案
import json
接下来在您的代码中:
microdata_content = response.xpath('//script[@type="application/ld+json"]/text()').extract_first()
microdata = json.loads(microdata_content)
ratingValue = microdata["aggregateRating"]["ratingValue"]
关于python - Scrapy 爬取 LD JSON 数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51232247/