selenium-chromedriver - puppeteer 操纵者在点击事件后识别元素内容

标签 selenium-chromedriver puppeteer

我试图在输入查询并单击按钮后从页面中提取特定元素。该页面不会导航到新的 URL:它只是返回我需要提取的新 HTML 内容。

这描述了我已经走了多远:

const puppeteer = require('puppeteer');

function timeout(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
};

const input_val = 'some query text';

(async() => {
    const browser = await puppeteer.launch()
    const page = await browser.newPage()
    await page.goto('http://target.com', { waitUntil: 'networkidle2' })
    await page.waitFor('input[name=query]')

    await page.evaluate((input_val) => {
      document.querySelector('input[name=query]').value = input_val;
      document.querySelector('.Button').click();
    }, input_val)

    // Now I want to console.log the <strong> tag fields 
    // innerText (will be 0-3 matching elements).
    // The lines below describe in non-puppeteer what 
    // I need to do. But this has no effect.

    const strongs = await page.$$('strong')
    for(var i=0; i<strongs.length; i++) {
      console.log(strongs[i].innerText);
    }

    await timeout(2000)
    await page.screenshot({path: 'example.png'}) // this renders results page ok

    browser.close();
})();

因此输入查询被正确输入,按钮点击被触发,屏幕截图显示网页已按预期响应。我只是不知道如何提取和报告相关位。

我一直在努力了解整个异步/等待范式,但我对它还是很陌生。非常感谢您的帮助。


编辑 - Vaviloff 方法错误:

(node:67405) UnhandledPromiseRejectionWarning: Error: Protocol error (Runtime.callFunctionOn): Cannot find context with specified id undefined
    at Promise (/Users/user/node_modules/puppeteer/lib/Connection.js:200:56)
    at new Promise (<anonymous>)
    at CDPSession.send (/Users/user/node_modules/puppeteer/lib/Connection.js:199:12)
    at ExecutionContext.evaluateHandle (/Users/user/node_modules/puppeteer/lib/ExecutionContext.js:79:75)
    at ExecutionContext.evaluate (/Users/user/node_modules/puppeteer/lib/ExecutionContext.js:46:31)
    at Frame.evaluate (/Users/user/node_modules/puppeteer/lib/FrameManager.js:326:20)
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:160:7)
(node:67405) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:67405) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

最佳答案

有一个有用的辅助工具 page.$$eval :

This method runs Array.from(document.querySelectorAll(selector)) within the page and passes it as the first argument to pageFunction.

因为它将数组传递给评估的函数,我们可以在其上使用 .map() 来提取所需的属性:

const strongs = await page.$$eval('strong', items => items.map( item => item.innerText));

更新 这是用于测试的完整工作脚本:

const puppeteer = require('puppeteer');

const input_val = '[puppeteer]';
const items_selector = '.question-hyperlink';

(async() => {

    const browser = await puppeteer.launch({
        headless: false,
    })
    const page = await browser.newPage()

    await page.goto('https://stackoverflow.com/', { waitUntil: 'networkidle2' })
    await page.waitFor('input[name=q]')
    await page.type('input[name=q]', input_val + '\r');
    await page.waitForNavigation();

    const items = await page.$$eval(items_selector, items => items.map( item => item.innerText));

    console.log(items);

    await browser.close();
})();

更新 2
https://diplodata.shinyapps.io/puppeteer-test/ 处沙箱脚本的修改版本

const puppeteer = require('puppeteer');
const input_val = 'puppeteer';

(async() => {

    const browser = await puppeteer.launch({
        headless: false,
    })
    const page = await browser.newPage()

    await page.goto('https://diplodata.shinyapps.io/puppeteer-test/', { waitUntil: 'networkidle2' })
    await page.waitFor('#query')
    await page.type('#query', input_val);
    await page.click('#go');
    await page.waitFor(500);
    const items = await page.$$eval('strong', items => items.map( item => item.innerText));

    console.log(items);

    await browser.close();
})();

产生以下结果:

[ 'On click below should read:', '<query>', 'puppeteer ' ]

关于selenium-chromedriver - puppeteer 操纵者在点击事件后识别元素内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50731495/

相关文章:

c# - 如何使用 Selenium C# 单击此按钮?

node.js - 几个小时后 Puppeteer 变慢

javascript - 将数组传递值映射到异步 Puppeteer 函数有时会返回不正确的值

java - chromedriver 在前台运行的 windows jenkins 从设备上失败

python - ChromeDriver ERR_SSL_PROTOCOL_ERROR 尽管 --ignore-certificate-errors

javascript - 无法在 headless 浏览器中读取控制台日志

node.js - Puppeteer 加载页面需要很长时间

javascript - 为什么我在抓取时会得到重复的数据?

java - Selenium Chrome 驱动程序 : NoClassDefFoundError: com/google/common/collect/Lists Exception

Python 3.4 Selenium 处理 chromedriver 异常