我试图在输入查询并单击按钮后从页面中提取特定元素。该页面不会导航到新的 URL:它只是返回我需要提取的新 HTML 内容。
这描述了我已经走了多远:
const puppeteer = require('puppeteer');
function timeout(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
};
const input_val = 'some query text';
(async() => {
const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.goto('http://target.com', { waitUntil: 'networkidle2' })
await page.waitFor('input[name=query]')
await page.evaluate((input_val) => {
document.querySelector('input[name=query]').value = input_val;
document.querySelector('.Button').click();
}, input_val)
// Now I want to console.log the <strong> tag fields
// innerText (will be 0-3 matching elements).
// The lines below describe in non-puppeteer what
// I need to do. But this has no effect.
const strongs = await page.$$('strong')
for(var i=0; i<strongs.length; i++) {
console.log(strongs[i].innerText);
}
await timeout(2000)
await page.screenshot({path: 'example.png'}) // this renders results page ok
browser.close();
})();
因此输入查询被正确输入,按钮点击被触发,屏幕截图显示网页已按预期响应。我只是不知道如何提取和报告相关位。
我一直在努力了解整个异步/等待范式,但我对它还是很陌生。非常感谢您的帮助。
编辑 - Vaviloff 方法错误:
(node:67405) UnhandledPromiseRejectionWarning: Error: Protocol error (Runtime.callFunctionOn): Cannot find context with specified id undefined
at Promise (/Users/user/node_modules/puppeteer/lib/Connection.js:200:56)
at new Promise (<anonymous>)
at CDPSession.send (/Users/user/node_modules/puppeteer/lib/Connection.js:199:12)
at ExecutionContext.evaluateHandle (/Users/user/node_modules/puppeteer/lib/ExecutionContext.js:79:75)
at ExecutionContext.evaluate (/Users/user/node_modules/puppeteer/lib/ExecutionContext.js:46:31)
at Frame.evaluate (/Users/user/node_modules/puppeteer/lib/FrameManager.js:326:20)
at <anonymous>
at process._tickCallback (internal/process/next_tick.js:160:7)
(node:67405) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:67405) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
最佳答案
有一个有用的辅助工具 page.$$eval :
This method runs
Array.from(document.querySelectorAll(selector))
within the page and passes it as the first argument to pageFunction.
因为它将数组传递给评估的函数,我们可以在其上使用 .map()
来提取所需的属性:
const strongs = await page.$$eval('strong', items => items.map( item => item.innerText));
更新 这是用于测试的完整工作脚本:
const puppeteer = require('puppeteer');
const input_val = '[puppeteer]';
const items_selector = '.question-hyperlink';
(async() => {
const browser = await puppeteer.launch({
headless: false,
})
const page = await browser.newPage()
await page.goto('https://stackoverflow.com/', { waitUntil: 'networkidle2' })
await page.waitFor('input[name=q]')
await page.type('input[name=q]', input_val + '\r');
await page.waitForNavigation();
const items = await page.$$eval(items_selector, items => items.map( item => item.innerText));
console.log(items);
await browser.close();
})();
更新 2
https://diplodata.shinyapps.io/puppeteer-test/ 处沙箱脚本的修改版本
const puppeteer = require('puppeteer');
const input_val = 'puppeteer';
(async() => {
const browser = await puppeteer.launch({
headless: false,
})
const page = await browser.newPage()
await page.goto('https://diplodata.shinyapps.io/puppeteer-test/', { waitUntil: 'networkidle2' })
await page.waitFor('#query')
await page.type('#query', input_val);
await page.click('#go');
await page.waitFor(500);
const items = await page.$$eval('strong', items => items.map( item => item.innerText));
console.log(items);
await browser.close();
})();
产生以下结果:
[ 'On click below should read:', '<query>', 'puppeteer ' ]
关于selenium-chromedriver - puppeteer 操纵者在点击事件后识别元素内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50731495/