由于 CSS，Puppeteer 返回大写的 innerText 值

使用正确的选择器、评估函数和 innerText 属性，我试图提取 div 的内容，例如:

<div class="abc">Interesting stuff</div>

但是 css 类将内容转换为大写:有趣的东西

innerText 属性返回大写而不是“原始”文本是否正常？有没有办法获得这个“原始”文本？

最佳答案

您可以使用以下属性来实现:

innerHTML 将内容解析为 HTML，因此需要更长的时间。
textContent 使用纯文本，不解析 HTML，速度更快。

例子:

innerHTML:

const text = await page.$eval('.abc', elem => elem.innerHTML); // returns 'Interesting stuff'

文本内容:

const text = await page.$eval('.abc', elem => elem.textContent); // returns 'Interesting stuff'

来自 API docs :

The innerHTML returns HTML or XML fragment is generated based on the current contents of the element, so the markup and formatting of the returned fragment is likely not to match the original page markup.

The textContent returns every element in the node. In contrast, innerText is aware of styling and won’t return the text of “hidden” elements. Moreover, since innerText takes CSS styles into account, reading the value of innerText triggers a reflow to ensure up-to-date computed styles. (Reflows can be computationally expensive, and thus should be avoided when possible.)

关于由于 CSS，Puppeteer 返回大写的 innerText 值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57216094/

上一篇：javascript - 如何在自定义圆环图上画针并向针中添加数据点？

下一篇：r - 定义在 ggplot2 中与多个 geom 一起使用时用于 geom_point 的形状

javascript - 遍历元素数组获取数据

javascript - 为什么我会收到 JavaScript Puppeteer 错误 : "ECONNREFUSED"?

javascript - 一种防止在 puppeteer 实例中打开开发工具的方法

node.js - Chrome Headless puppeteer 占用太多 CPU

puppeteer - 获取 "TimeoutError: waiting for Page.printToPDF failed"时如何增加 pupetteer 的 Page.pdf 的超时？

javascript - css:在打印的 PDF 页面的 HTML 页脚中呈现页码(Chromium)

chromium - 是否可以使用 Puppeteer 禁用 websocket？

node.js - 使用 puppeteer 在选项卡之间切换

node.js - 如何对在 GraphQL 中创建 headless Chrome 实例的函数进行分组调用