我正在尝试加载网页( https://genpact.taleo.net/careersection/sgy_external_career_section/jobsearch.ftl?lang=en )以使用 HtmlUnit WebClient 进行抓取。但内容未正确加载。例如,我无法找到“应用”按钮。 我的网络客户端代码如下
webClient.setCssErrorHandler(new DefaultCssErrorHandler());
webClient.setJavaScriptErrorListener(new DefaultJavaScriptErrorListener());
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClient.getCookieManager().setCookiesEnabled(true);
webClient.waitForBackgroundJavaScript(60000);
有人可以帮我解决这个问题吗
最佳答案
这对我有用
public static void main(String[] args) throws IOException{
final String url = "https://genpact.taleo.net/careersection/sgy_external_career_section/jobsearch.ftl?lang=en";
try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_60)) {
HtmlPage page = webClient.getPage(url);
// waitForBackgroundJavaScript has to be called after every action
// this page is really slow wait for the last part of the dynamic content
while(!page.asText().contains("Previous\r\n1\r\n2\r\n3\r\n4\r\n")) {
webClient.waitForBackgroundJavaScript(1_000);
}
System.out.println("-------------------------------------------------------------------------------");
System.out.println(page.asText());
System.out.println("-------------------------------------------------------------------------------");
}
}
关于javascript - 使用 Htmlunit WebClient 未完全加载网页的动态内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56255407/