java - Tesseract - 错误 net.sourceforge.tess4j.Tesseract - 空

标签 java tomcat ocr tesseract tess4j

创建了一个使用 Tesseract 的 java 应用程序,以便将给定的图像或 pdf 转换为字符串格式,当它在我的机器上作为使用 junit 的单元测试运行时它运行得很好但是当运行完整的系统时它是一个 restFul API由接收图像并运行 Tesseract 的 tomcat 运行,它给我以下错误:

23:22:36.511 [http-nio-9999-exec-3] ERROR net.sourceforge.tess4j.Tesseract - null java.lang.NullPointerException: null at net.sourceforge.tess4j.util.PdfUtilities.convertPdf2Png(PdfUtilities.java:107) at net.sourceforge.tess4j.util.PdfUtilities.convertPdf2Tiff(PdfUtilities.java:48) at net.sourceforge.tess4j.util.ImageIOHelper.getIIOImageList(ImageIOHelper.java:343) at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:213) at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:197) at ocr.OcrUtil.getString(OcrUtil.java:54) at com.tapd.server.api.handlers.IRSHandler.uploadIRSImage(IRSHandler.java:65) at com.tapd.server.api.WebAPIService.updateParentIrsForm(WebAPIService.java:250) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161) at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:160) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99) at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389) at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347) at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102) at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:309) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) at org.glassfish.jersey.internal.Errors.process(Errors.java:315) at org.glassfish.jersey.internal.Errors.process(Errors.java:297) at org.glassfish.jersey.internal.Errors.process(Errors.java:267) at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317) at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:292) at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1139) at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:460) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:386) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:334) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:221) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:230) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:165) at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:192) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:165) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:198) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:108) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:522) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79) at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:620) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:349) at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:1110) at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66) at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:785) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1425) at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Unknown Source) [2016-09-14 23:22:36,512] [ERROR] java.lang.NullPointerException

我的猜测是 tessdata 文件夹不在正确的位置,当打包到 Jar 中并由 tomcat 运行时,它放错了位置,但我无法弄清楚它应该位于何处,我已经仔细检查以查看所有 Jar 都已正确部署。

编辑:所以当 Tesseract 在远程服务器(如 AWS S3)上时,它似乎无法处理路径,所以问题是为什么?以及如何允许它使用来自 S3 的路径? (是的,文件是公开的)

最佳答案

我的猜测是没有正确记录 GhostscriptException,这会导致 NullPointerException:

https://github.com/nguyenq/tess4j/blob/212d72bc2ec8b3a4d4f5a18f1eb01a0622fc5521/src/main/java/net/sourceforge/tess4j/util/PdfUtilities.java#L107

106        } catch (GhostscriptException e) {
107            logger.error(e.getCause().toString(), e);
108        } finally {

第 107 行 - e.getCause()(可能)为 null,调用 null.toString() 会抛出 NPE。

(来自规范 - getCause 可以为空: https://docs.oracle.com/javase/7/docs/api/java/lang/Throwable.html#getCause() ,GhostscriptException 也允许原因为空:http://grepcode.com/file/repo1.maven.org/maven2/org.ghost4j/ghost4j/1.0.0/org/ghost4j/GhostscriptException.java )

要验证这个答案(无需重新编译整个 tess4j),您可以在 Debug模式下启动您的程序并在第 107 行放置一个断点。这将为您提供有关真正异常的信息。

关于java - Tesseract - 错误 net.sourceforge.tess4j.Tesseract - 空,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39504263/

相关文章:

java - 如何要求泛型参数是实现接口(interface)的枚举?

spring-mvc - 带有 Thymeleaf 和 Tomcat 8 UTF-8 编码问题的 Spring MVC

Tomcat 6 为加载类添加特定于 webapp 的目录

android - Vuforia文字识别和OCR的区别?

java - 在拥挤的图像中使用 opencv 确定文本区域

java - 从 HashMap<Enum, Object> 返回随机值

java - 当我检查空概率时,检查显示空指针异常

python - 改进云视觉上的手写 OCR?

java - 如何限制玩家 Activity ?

tomcat - 即使在禁用 tls 后 SSLV3 也无法正常工作