我向下滚动页面,然后获取 xpath。这是 xpath: //div[@id="js-hook-description"]//p/text
这是代码
const results = xpathT.fromPageSource(data).findElements(rest);
//console.log("The href value is:", results[0].getAttribute("href"));
console.log(`Your full text is "${results[0].getText()}"`);
if (results.length > 0) {
let _results = [];
if (path.includes("href", 0)){
for (let r of results) {
_results.push(r.getAttribute("href"));
}
}
if (path.includes("text", 0)){
//console.log("inside");
//console.log(results);
for (let r of results) {
console.log(r.getText());
_results.push(r.getText());
}
当我简单地打印结果时,它会给出以下结果:
Your full text is "<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="5" color="#ff0000">LAMBORGHİNİ GALLARDO LP560-4</font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="5" color="#ff0000">2009 MODEL - 38.000 KM</font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="4">DOĞUŞ OTO <font color="#ff0000">BAYİİ</font> ÇIKIŞLI</font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4"><br/></font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">AİRMATİC (LİFT)</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">SERAMİK FREN</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">GERİ GÖRÜŞ KAMERASI</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">PADDLESHİFT (F1)</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">2 BÖLGE KLİMA</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">DERİ KOLTUK</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">Bİ-ZENON FAR</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">YAĞMUR SENSÖRÜ</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">CD-USB-AUX-MP3</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4"><br/></font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="4">?</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="4">BOYA - HATA - TRAMER - HASAR KAYDI </font><font color="#ff0000"><font size="4"> </font><font size="5">YOKTUR</font></font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="5"><br/></font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="5">ARACIMIZIN TAMPONLARI DAHİL <font color="#ff0000">BOYASIZ</font></font></b><br/></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="4"><br/></font></b></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><b><font size="4">YEDEK ANAHTARI <font color="#ff0000">MEVCUTTUR</font></font></b><br/></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b>?</b></span><br/></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="5">DETAYLI BİLGİ İÇİN LÜTFEN ARAYINIZ</font></b></span></p>,<p style="text-align: center;" xmlns="http://www.w3.org/1999/xhtml"><span><b><font size="5" color="#ff0000"><br/></font></b></span></p>,<p style="text-align: cente...
但是当我调用 .getText() 时,它返回未定义。可能的解决方案是什么?
最佳答案
您可以使用 page.evaluate
获取任何 DOM 元素的innerText 属性。如果您需要按段落显示文本,您应该为 <p>
使用正确的 CSS 选择器。元素,在本例中为:#js-hook-description > div > p
。可以使用 page.$$
来收集匹配的元素。方法(与页面上下文中的 document.querySelectorAll()
相同),然后可以迭代这些元素(参见下面的 for..of
和 Array.map
变体),在每次迭代中 innerText
被检索到并且也是 String.trim()
用于清除段落中的换行符(例如: \n
)。
// full text content into one string
const fullText = await page.evaluate(el => el.innerText, await page.$('#js-hook-description'))
console.log(fullText)
// each paragraph into an array element I.
const textArray = []
const paragraphs = await page.$$('#js-hook-description > div > p')
for (const p of paragraphs) {
const actualPara = await page.evaluate(el => el.innerText.trim(), p)
textArray.push(actualPara)
}
console.log(JSON.stringify(textArray))
可以使用 page.$$eval
来完成替代解决方案和Array.map
:
// each paragraph into an array element II.
const alternativeSolution = await page.$$eval('#js-hook-description > div > p', paragraphs => paragraphs.map(p => p.innerText.trim()))
console.log(JSON.stringify(alternativeSolution))
全文输出:
LAMBORGHİNİ GALLARDO LP560-4 2009 MODEL - 38.000 KM DOĞUŞ OTO BAYİİ ÇIKIŞLI AİRMATİC (LİFT) SERAMİK FREN GERİ GÖRÜŞ KAMERASI PADDLESHİFT (F1) 2 BÖLGE KLİMA DERİ KOLTUK Bİ-ZENON FAR YAĞMUR SENSÖRÜ CD-USB-AUX-MP3 ? BOYA - HATA - TRAMER - HASAR KAYDI YOKTUR ARACIMIZIN TAMPONLARI DAHİL BOYASIZ YEDEK ANAHTARI MEVCUTTUR ? DETAYLI BİLGİ İÇİN LÜTFEN ARAYINIZ 0533 239 22 77
数组逐行输出:
["LAMBORGHİNİ GALLARDO LP560-4","2009 MODEL - 38.000 KM","DOĞUŞ OTO BAYİİ ÇIKIŞLI","","AİRMATİC (LİFT)","SERAMİK FREN","GERİ GÖRÜŞ KAMERASI","PADDLESHİFT (F1)","2 BÖLGE KLİMA","DERİ KOLTUK","Bİ-ZENON FAR","YAĞMUR SENSÖRÜ","CD-USB-AUX-MP3","","?","BOYA - HATA - TRAMER - HASAR KAYDI YOKTUR","","ARACIMIZIN TAMPONLARI DAHİL BOYASIZ","","YEDEK ANAHTARI MEVCUTTUR","?","DETAYLI BİLGİ İÇİN LÜTFEN ARAYINIZ","","0533 239 22 77"]
关于node.js - 如何读取标签内的文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62934026/