java - 超正方体 : Index out of bounds exceptions for OCR method

标签 java tesseract tess4j

我正在开发一个 Spring-MVC 应用程序,我在其中使用 Tesseract 进行 OCR。我正在为我传递的文件获取索引超出范围的异常。有什么想法吗?

错误日志:

et.sourceforge.tess4j.TesseractException: java.lang.IndexOutOfBoundsException
    at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:215)
    at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:196)
    at com.tooltank.spring.service.GroupAttachmentsServiceImpl.testOcr(GroupAttachmentsServiceImpl.java:839)
    at com.tooltank.spring.service.GroupAttachmentsServiceImpl.lambda$addAttachment$0(GroupAttachmentsServiceImpl.java:447)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IndexOutOfBoundsException
    at javax.imageio.stream.FileCacheImageOutputStream.seek(FileCacheImageOutputStream.java:170)
    at net.sourceforge.tess4j.util.ImageIOHelper.getImageByteBuffer(ImageIOHelper.java:297)
    at net.sourceforge.tess4j.Tesseract.setImage(Tesseract.java:397)
    at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:290)
    at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:212)
    ... 4 more

代码:

 private String testOcr(String fileLocation, int attachId) {
        try {
            File imageFile = new File(fileLocation);
            BufferedImage img = ImageIO.read(imageFile);
            BufferedImage blackNWhite = new BufferedImage(img.getWidth(), img.getHeight(), BufferedImage.TYPE_BYTE_BINARY);
            Graphics2D graphics = blackNWhite.createGraphics();
            graphics.drawImage(img, 0, 0, null);
            String identifier = String.valueOf(new BigInteger(130, random).toString(32));
            String blackAndWhiteImage = previewPath + identifier + ".png";
            File outputfile = new File(blackAndWhiteImage);
            ImageIO.write(blackNWhite, "png", outputfile);

            ITesseract instance = new Tesseract();
            // Point to one folder above tessdata directory, must contain training data
            instance.setDatapath("/usr/share/tesseract-ocr/");
            // ISO 693-3 standard
            instance.setLanguage("deu");
            String result = instance.doOCR(outputfile);
            result = result.replaceAll("[^a-zA-Z0-9öÖäÄüÜß@\\s]", "");
            Files.delete(new File(blackAndWhiteImage).toPath());
            GroupAttachments groupAttachments = this.groupAttachmentsDAO.getAttachmenById(attachId);
            System.out.println("OCR Result is "+result);
            if (groupAttachments != null) {
                saveIndexes(result, groupAttachments.getFileName(), null, groupAttachments.getGroupId(), false, attachId);
            }
            return result;
        } catch (Exception e) {
            e.printStackTrace();

        }
        return null;
    }

谢谢。

最佳答案

由于 Java Image IO 中的错误(已通过 Java 9 修复),当前版本的 Java Tesseract Wrapper(编写此答案时为 3.4.0)不适用于 < Java 9。要使用较低的 Java版本,您可以尝试对 Tesseract ImageIOHelper 类进行以下修复。只需在您的项目中复制该类并应用必要的更改,它就可以顺利地处理文件和 BufferedImages。

注意:此版本没有使用原类中使用的Tiff优化,如果您的项目需要,您可以添加它。

public static ByteBuffer getImageByteBuffer(RenderedImage image) throws IOException {
    //Set up the writeParam
    if (image instanceof BufferedImage) {
        return convertImageData((BufferedImage) image);
    }
    ColorModel cm = image.getColorModel();
    int width = image.getWidth();
    int height = image.getHeight();
    WritableRaster raster = cm
            .createCompatibleWritableRaster(width, height);
    boolean isAlphaPremultiplied = cm.isAlphaPremultiplied();
    Hashtable properties = new Hashtable();
    String[] keys = image.getPropertyNames();
    if (keys != null) {
        for (int i = 0; i < keys.length; i++) {
            properties.put(keys[i], image.getProperty(keys[i]));
        }
    }
    BufferedImage result = new BufferedImage(cm, raster,
            isAlphaPremultiplied, properties);
    image.copyData(raster);
    return convertImageData(result);
}

关于java - 超正方体 : Index out of bounds exceptions for OCR method,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44384477/

相关文章:

java - 如何将相机固定在玩家身上,以便当玩家移动时相机也会跟随

java - 尝试从 imagebutton 启动新 Activity ,但我的应用程序在模拟器中停止工作

android - 无法使用移动视觉 API 从图像中读取文本

java - Tesseract For Java 为可执行 jar 设置 Tessdata_Prefix

linux - Ubuntu Linux 上的 Tess4J,UnsatisfiedLinkError

Java:淡出音乐

java - 获取条件查询的 JPQL/SQL 字符串表示形式

android - 为 OCR 校正和过滤图像

react-native - 有没有办法使用 Expo React Native 检测图像中的文本?

java - 将 PDF 转换为 PNG (tess4j) - 本地工作正常,但在服务器 JBoss 中不工作