java - 如何在selenium中读取PDF内容

标签 java html selenium selenium-webdriver pdf-parsing

我正在尝试验证 PDF 中的内容,我使用 href 获取 URL 并将其传递到下面的代码中。 URL 使用 HTTPS,所以我面临以下问题。任何人都可以帮助我如何继续并帮助我阅读 pdf 数据。提前致谢

重试的网址为https://XXXXXXXXXXXXXXXXX/XXXX/XXXXXXXXXXXX?docType=pdf&docid=2229123

        URL PDFUrl = new URL(url);
        BufferedInputStream TestFile = new BufferedInputStream(PDFUrl.openStream());
        PDFParser TestPDF = new PDFParser((RandomAccessRead) TestFile);
        TestPDF.parse();
        String TestText = new PDFTextStripper().getText(TestPDF.getPDDocument());
        System.out.println("Document Text is   "+   TestText);

错误是

java.net.ConnectException: Connection timed out: connect
    at java.net.DualStackPlainSocketImpl.connect0(Native Method)
    at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
    at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
    at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
    at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
    at java.net.PlainSocketImpl.connect(Unknown Source)
    at java.net.SocksSocketImpl.connect(Unknown Source)
    at java.net.Socket.connect(Unknown Source)
    at sun.security.ssl.SSLSocketImpl.connect(Unknown Source)
    at sun.security.ssl.BaseSSLSocketImpl.connect(Unknown Source)
    at sun.net.NetworkClient.doConnect(Unknown Source)
    at sun.net.www.http.HttpClient.openServer(Unknown Source)
    at sun.net.www.http.HttpClient.openServer(Unknown Source)
    at sun.net.www.protocol.https.HttpsClient.<init>(Unknown Source)
    at sun.net.www.protocol.https.HttpsClient.New(Unknown Source)
    at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(Unknown Source)
    at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(Unknown Source)
    at sun.net.www.protocol.http.HttpURLConnection.plainConnect(Unknown Source)
    at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(Unknown Source)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
    at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown Source)
    at java.net.URL.openStream(Unknown Source)

最佳答案

您是否在驱动程序所需的功能中设置接受 SSL 证书?

DesiredCapabilities dc = DesiredCapabilities.chrome ()       
dc.setCapability (CapabilityType.ACCEPT_SSL_CERTS, true)
WebDriver driver = new ChromeDriver (dc);

关于java - 如何在selenium中读取PDF内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60630496/

相关文章:

java - com.microsoft.sqlserver.jdbc.SQLServerException : MSI Token failure: Failed to acquire token from MSI Endpoint

Java I/O : Ensure a file is not locked by another process before any r/w operation

java - 如何在 Eclipse 中导入库并使用该库中的类

html - CSS 菜单问题

html - 如何在 Victoire 生成的小部件中添加 HTML 属性

jquery - Twitter Bootstrap Scrollspy 总是突出显示最后一个元素

java - DSA算法java中生成P

java - 在java中读取csv时如何跳过第一行

python - 使用 Python 在 Selenium 中模拟鼠标移动

java - selenium:如果用户关闭浏览器或网络驱动程序怎么办?如何检测浏览器是否关闭?