pdf - Tess4J - 在资源路径中找不到 native 库 (linux-x86-64/libtesseract.so)

标签 pdf tesseract ghostscript tess4j

我正在使用 Tess4J(围绕 tesseract 的 JNA 包装器)，并尝试调用 tess.doOCR(myFile)从单页 PDF 转换为 OCR 文本。

我安装了 GhostScript(通过使用 yum install ghostscript )，gs -h工作正常。

我的应用服务器正在使用 64-bit JVM ，我有 gsdll64.dll ，以及 64 位 tesseract dll 的 liblept168.dll和 libtesseract302.dll在类路径中。

当tess.doOCR(myFile)被调用，这被记录:

GPL Ghostscript 8.70 (2014-09-22)
Copyright (C) 2014 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 1.
Page 1

但它只是停在那里。该程序不再进行。

更新——

看起来真正的问题来自这个错误:

java.lang.UnsatisfiedLinkError: Unable to load library 'tesseract': Native library (linux-x86-64/libtesseract.so) not found in resource path

看了很多遍，我没有找到一个方便的地方找到这个libtesseract.so文件，我不确定如何将它放到我的 Linux 应用服务器上。我读到也许我需要下载一些 C++ 运行时，但我没有看到 Linux 下载。任何建议将不胜感激。

还是这与符号链接(symbolic link)有关？

最佳答案

修复对我来说很简单，只需从命令行执行 sudo apt-get install tesseract-ocr 即可。对于 linux，您无需担心 DDL 库或 jvm 版本。从 apt-get 安装 tessearct 就可以了。

关于pdf - Tess4J - 在资源路径中找不到 native 库 (linux-x86-64/libtesseract.so)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/26577644/

上一篇：AngularJS 观察一个表达式

下一篇：wix - 如何有选择地将公共(public)属性从 Bundle 传递给 MSI

c++ - 对 `tesseract::TessBaseAPI::TessBaseAPI()' 的 undefined reference

pdf - 将 PDF 转换为 JPG 的替代解决方案

c# - 32位进程下运行GhostscriptLibraryNotInstalledException需要原生库

python - 编辑 : Can Ghostscript give me the binary data rather than output to a directory?

java - 如何制作一个下载按钮来检索sql数据库中的pdf文件？(java swing)

javascript - PDF JavaScript 删除页面

java - 训练 Tesseract - 加载训练语言失败

java - Android tesseract 数据路径

ios - Swift:将 pdf 文件读入 pdf 应用程序阅读器