java - Selenium - driver.getPageSource() 与从浏览器查看的源不同

我正在尝试使用 selenium 将指定 URL 中的源代码捕获到 HTML 文件中，但我不知道为什么，我没有获得我们从浏览器中看到的确切源代码。

下面是我在 HTML 文件中捕获源代码的 java 代码

private static void getHTMLSourceFromURL(String url, String fileName) {

    WebDriver driver = new FirefoxDriver();
    driver.get(url);

    try {
        Thread.sleep(5000);   //the page gets loaded completely

        List<String> pageSource = new ArrayList<String>(Arrays.asList(driver.getPageSource().split("\n")));

        writeTextToFile(pageSource, originalFile);

    } catch (InterruptedException e) {
        e.printStackTrace();
    }

    System.out.println("quitting webdriver");
    driver.quit();
}

/**
 * creates file with fileName and writes the content
 * 
 * @param content
 * @param fileName
 */
private static void writeTextToFile(List<String> content, String fileName) {
    PrintWriter pw = null;
    String outputFolder = ".";
    File output = null;
    try {
        File dir = new File(outputFolder + '/' + "HTML Sources");
        if (!dir.exists()) {
            boolean success = dir.mkdirs();
            if (success == false) {
                try {
                    throw new Exception(dir + " could not be created");
                } catch (Exception e) {
                    e.printStackTrace();
                }
            }
        }

        output = new File(dir + "/" + fileName);
        if (!output.exists()) {
            try {
                output.createNewFile();
            } catch (IOException ioe) {
                ioe.printStackTrace();
            }
        }
        pw = new PrintWriter(new FileWriter(output, true));
        for (String line : content) {
            pw.print(line);
            pw.print("\n");
        }
    } catch (IOException ioe) {
        ioe.printStackTrace();
    } finally {
        pw.close();
    }

}

有人能解释一下为什么会发生这种情况吗？ WebDriver 是如何渲染页面的？以及浏览器如何显示来源？

最佳答案

有几个地方可以获取源码，你可以试试

String pageSource=driver.findElement(By.tagName("body")).getText();

看看会发生什么。

通常您不需要等待页面加载。Selenium 会自动完成，除非您有单独的 Javascript/Ajax 部分。

您可能想添加您看到的差异，以便我们理解您的真正意思。

Webdriver 不会自己呈现页面，它只是按照浏览器看到的那样呈现它。

关于java - Selenium - driver.getPageSource() 与从浏览器查看的源不同，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/19358658/

java - Selenium - driver.getPageSource() 与从浏览器查看的源不同

上一篇：java - Struts 1.2 中的控制流(生命周期)

下一篇：java - 如何每秒做 n 次？