javascript - 无法使用 Cheerio/node.js 抓取图像

标签 javascript node.js image screen-scraping cheerio

我的问题非常简单。我正在尝试控制台记录来自下面的亚马逊链接的图像的 URL。要么来自更精确的选择

所以我花了大部分时间尝试选择链接的 id/class,但似乎只能得到与 #imgTagWrapperId 一样接近的值,它会返回大量冗余信息。理论上,我应该能够使用正则表达式来抓取链接,从而缩小范围,但就我的一生而言,我似乎只能替换我返回的文本,而不能简单地抓取它。或者,我如上所述尝试获取 img src 本身,结果只是返回一个无意义的代码字符串。当我查看页面源代码时,相同的文本球会出现在那里,但当我直接检查元素时不会出现。

const request = require('request');
const cheerio = require('cheerio');

request(`http://amazon.com/dp/B079H6RLKQ`, (error,response,html) =>{
    if (!error && response.statusCode ==200) {
        const $ = cheerio.load(html);
        const productTitle = $("#productTitle").text().replace(/\s\s+/g, '');

        const prodImg = $(`#imgTagWrapperId`).html();

        console.log(productTitle);

        console.log(prodImg);
    } else {
        console.log(error);
    }
})

当前代码如实返回产品标题,但返回 prodImg 输出:

<img alt="Samsung Galaxy S9 G960U 64GB Unlocked 4G LTE Phone w/ 12MP Camera - Midnight Black" src="
 

...(this nonsense goes on for a mile) ....

" data-old-hires="https://images-na.ssl-images-amazon.com/images/I/81%2Bh9mpyQmL._SL1500_.jpg"  class="a-dynamic-image  a-stretch-horizontal" id="landingImage" data-a-dynamic-image="{&quot;https://images-na.ssl-images-amazon.com/images/I/81%2Bh9mpyQmL._SX522_.jpg&quot;:[564,522],&quot;https://images-na.ssl-images-amazon.com/images/I/81%2Bh9mpyQmL._SX342_.jpg&quot;:[369,342],&quot;https://images-na.ssl-images-amazon.com/images/I/81%2Bh9mpyQmL._SX679_.jpg&quot;:[733,679],&quot;https://images-na.ssl-images-amazon.com/images/I/81%2Bh9mpyQmL._SX425_.jpg&quot;:[459,425],&quot;https://images-na.ssl-images-amazon.com/images/I/81%2Bh9mpyQmL._SX466_.jpg&quot;:[503,466],&quot;https://images-na.ssl-images-amazon.com/images/I/81%2Bh9mpyQmL._SX569_.jpg&quot;:[615,569],&quot;https://images-na.ssl-images-amazon.com/images/I/81%2Bh9mpyQmL._SX385_.jpg&quot;:[416,385]}" style="max-width:679px;max-height:733px;">
            </div>

预先感谢您对此提供的任何帮助和指导。我已经用尽了所有其他常用来源,并准备好被称为白痴。

编辑:

有人想要选择之前和之后的 html,我会帮忙,但最好只查看链接中的页面源代码并按 ctrl+f。文字墙如下。

<div class="variationUnavailable unavailableExp">
    <div class="inner">
        
        <div class="a-box a-alert a-alert-error" aria-live="assertive" role="alert"><div class="a-box-inner a-alert-container"><h4 class="a-alert-heading">Image Unavailable</h4><i class="a-icon a-icon-alert"></i><div class="a-alert-content">
            <span class="a-text-bold">
                Image not available for<br/>Color:
                <span class="unvailableVariation"></span>
            </span>
        </div></div></div>
    </div>
</div>




<!-- Append onload function to stretch image on load to avoid flicker when transitioning from low res image from Mason to large image variant in desktop -->
<!-- any change in onload function requires a corresponding change in Mason to allow it pass in /mason/amazon-family/gp/product/features/embed-features.mi -->
<!-- and /mason/amazon-family/gp/product/features/embed-landing-image.mi -->



<ul class="a-unordered-list a-nostyle a-horizontal list maintain-height">

        <span id="imageBlockEDPOverlay"></span>



	<li class="image item itemNo0 selected maintain-height"><span class="a-list-item">
	    <span class="a-declarative" data-action="main-image-click" data-main-image-click="{}">
	        <div id="imgTagWrapperId" class="imgTagWrapper">
	            <img alt="Samsung Galaxy S9 G960U 64GB Unlocked 4G LTE Phone w/ 12MP Camera - Midnight Black" src="








" data-old-hires="https://images-na.ssl-images-amazon.com/images/I/81%2Bh9mpyQmL._SL1500_.jpg"  class="a-dynamic-image  a-stretch-horizontal" id="landingImage" data-a-dynamic-image="{&quot;https://images-na.ssl-images-amazon.com/images/I/81%2Bh9mpyQmL._SX522_.jpg&quot;:[564,522],&quot;https://images-na.ssl-images-amazon.com/images/I/81%2Bh9mpyQmL._SX342_.jpg&quot;:[369,342],&quot;https://images-na.ssl-images-amazon.com/images/I/81%2Bh9mpyQmL._SX679_.jpg&quot;:[733,679],&quot;https://images-na.ssl-images-amazon.com/images/I/81%2Bh9mpyQmL._SX425_.jpg&quot;:[459,425],&quot;https://images-na.ssl-images-amazon.com/images/I/81%2Bh9mpyQmL._SX466_.jpg&quot;:[503,466],&quot;https://images-na.ssl-images-amazon.com/images/I/81%2Bh9mpyQmL._SX569_.jpg&quot;:[615,569],&quot;https://images-na.ssl-images-amazon.com/images/I/81%2Bh9mpyQmL._SX385_.jpg&quot;:[416,385]}" style="max-width:679px;max-height:733px;">
	        </div>
	    </span>
	</span></li>




<li class="mainImageTemplate template"><span class="a-list-item">
    <span class="a-declarative" data-action="main-image-click" data-main-image-click="{}">
        <div class="imgTagWrapper">
            <span class="placeHolder"></span>
        </div>
    </span>
</span></li>

最佳答案

感谢 Rishi Raj 提供了快速修复解决方案。 $('#landingImage').attr('data-old-hires').我还在 const 中添加了不必要的 .html() ,这已经成为了障碍。再次感谢大家!

关于javascript - 无法使用 Cheerio/node.js 抓取图像,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57348977/

相关文章:

javascript - 如何破译JavaScript中的sort方法?

javascript - 当翻译 :scale is applied? 时,我如何计算出 div 移动了多少

node.js - 无法使用nodejs订阅Pusher channel

node.js - 如何在 hogan-express 中包含外部模板文件

javascript - CreateJS - 图像位图并不总是显示

image - 特征向量划分

javascript - typescript 与 JSX

javascript - 服务器正在缓存 JS 文件或嵌入 HTML 脚本吗?

javascript - 跟踪表nodejs的变化

android - 有人曾经在图像上做过基本的 'scan-animation' 吗?