node.js - 如何读取标签内的文本

标签 node.js xpath puppeteer playwright

我需要抓取此页面: https://www.arabam.com/ilan/galeriden-satilik-lamborghini-gallardo-lp-560-4/mini-motors-dan-2009-gallardo-lp560-4-seramik-lift-bayi-boyasiz/14934711

如果你向下滚动你会看到这个 enter image description here

我向下滚动页面,然后获取 xpath。这是 xpath: //div[@id="js-hook-description"]//p/text

这是代码

const results = xpathT.fromPageSource(data).findElements(rest);
    
    //console.log("The href value is:", results[0].getAttribute("href"));
    console.log(`Your full text is "${results[0].getText()}"`);
    if (results.length > 0) {
      let _results = [];
      if (path.includes("href", 0)){
          
          for (let r of results) {
              
            _results.push(r.getAttribute("href"));
          }
      }
      if (path.includes("text", 0)){
          //console.log("inside");
          //console.log(results);
          for (let r of results) {
             console.log(r.getText());
            _results.push(r.getText());
          }

当我简单地打印结果时,它会给出以下结果:

Your full text is "<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="5" color="#ff0000">LAMBORGHİNİ GALLARDO LP560-4</font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="5" color="#ff0000">2009 MODEL - 38.000 KM</font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="4">DOĞUŞ OTO <font color="#ff0000">BAYİİ</font> ÇIKIŞLI</font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4"><br/></font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">AİRMATİC (LİFT)</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">SERAMİK FREN</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">GERİ GÖRÜŞ KAMERASI</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">PADDLESHİFT (F1)</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">2 BÖLGE KLİMA</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">DERİ KOLTUK</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">Bİ-ZENON FAR</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">YAĞMUR SENSÖRÜ</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">CD-USB-AUX-MP3</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4"><br/></font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">?</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="4">BOYA - HATA - TRAMER - HASAR KAYDI </font><font color="#ff0000"><font size="4"> </font><font size="5">YOKTUR</font></font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="5"><br/></font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="5">ARACIMIZIN TAMPONLARI DAHİL <font color="#ff0000">BOYASIZ</font></font></b><br/></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="4"><br/></font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="4">YEDEK ANAHTARI <font color="#ff0000">MEVCUTTUR</font></font></b><br/></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b>?</b></span><br/></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="5">DETAYLI BİLGİ İÇİN LÜTFEN ARAYINIZ</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="5" color="#ff0000"><br/></font></b></span></p>,<p style="text-align: cente...

但是当我调用 .getText() 时,它返回未定义。可能的解决方案是什么?

最佳答案

您可以使用 page.evaluate 获取任何 DOM 元素的innerText 属性。如果您需要按段落显示文本,您应该为 <p> 使用正确的 CSS 选择器。元素,在本例中为:#js-hook-description > div > p 。可以使用 page.$$ 来收集匹配的元素。方法(与页面上下文中的 document.querySelectorAll() 相同),然后可以迭代这些元素(参见下面的 for..ofArray.map 变体),在每次迭代中 innerText被检索到并且也是 String.trim()用于清除段落中的换行符(例如: \n )。

// full text content into one string
  const fullText = await page.evaluate(el => el.innerText, await page.$('#js-hook-description'))
  console.log(fullText)

// each paragraph into an array element I.
  const textArray = []
  const paragraphs = await page.$$('#js-hook-description > div > p')
  for (const p of paragraphs) {
     const actualPara = await page.evaluate(el => el.innerText.trim(), p)
     textArray.push(actualPara)
  }
  console.log(JSON.stringify(textArray))

可以使用 page.$$eval 来完成替代解决方案和Array.map :

// each paragraph into an array element II.
  const alternativeSolution = await page.$$eval('#js-hook-description > div > p', paragraphs => paragraphs.map(p => p.innerText.trim()))
  console.log(JSON.stringify(alternativeSolution))

全文输出:

LAMBORGHİNİ GALLARDO LP560-4 2009 MODEL - 38.000 KM DOĞUŞ OTO BAYİİ ÇIKIŞLI AİRMATİC (LİFT) SERAMİK FREN GERİ GÖRÜŞ KAMERASI PADDLESHİFT (F1) 2 BÖLGE KLİMA DERİ KOLTUK Bİ-ZENON FAR YAĞMUR SENSÖRÜ CD-USB-AUX-MP3 ? BOYA - HATA - TRAMER - HASAR KAYDI  YOKTUR ARACIMIZIN TAMPONLARI DAHİL BOYASIZ YEDEK ANAHTARI MEVCUTTUR ? DETAYLI BİLGİ İÇİN LÜTFEN ARAYINIZ 0533 239 22 77

数组逐行输出:

["LAMBORGHİNİ GALLARDO LP560-4","2009 MODEL - 38.000 KM","DOĞUŞ OTO BAYİİ ÇIKIŞLI","","AİRMATİC (LİFT)","SERAMİK FREN","GERİ GÖRÜŞ KAMERASI","PADDLESHİFT (F1)","2 BÖLGE KLİMA","DERİ KOLTUK","Bİ-ZENON FAR","YAĞMUR SENSÖRÜ","CD-USB-AUX-MP3","","?","BOYA - HATA - TRAMER - HASAR KAYDI  YOKTUR","","ARACIMIZIN TAMPONLARI DAHİL BOYASIZ","","YEDEK ANAHTARI MEVCUTTUR","?","DETAYLI BİLGİ İÇİN LÜTFEN ARAYINIZ","","0533 239 22 77"]

关于node.js - 如何读取标签内的文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62934026/

相关文章:

angularjs - Puppeteer 和 angularjs html 渲染问题

node.js - 如何测试 node.js websocket 服务器?

node.js - couchbase集群和nodejs分布式架构

node.js - 如何在 Node.js 中加密?

xml - XSL使用其包含元素的位置更新属性

python - 由于命名空间,使用 Scrapy Python 无法从带有 xpath 的响应 html 中提取数据

xml - XPath 选择节点直到条件

node.js - 检查 mongodb 上的日期范围内的日期是否冲突

javascript - 寻找一种编写自定义 Puppeteer 命令的方法

javascript - 无法获取元数据选择器文本