android - 为什么 Tesseract for Android 在使用 "equ.traineddata"检测数学符号或方程时返回乱码？

我们希望开发应用程序来从图像中提取文本以及提取和求解数学方程式使用 Tesseract OCR 引擎实现从图像中提取文本但是当我们试图从图像中提取方程式时，结果却令人失望我们一直在使用 3.01 版本我们预计这是问题的原因所以我们构建了位于存储库中的最新版本的 Tesseract https://github.com/rmtheis/tess-two 我们使用官方训练的数据文件 eng.traineddata 检测文本，这很好用，equ.traineddata 检测数学符号和数学方程式，但没有给出预期的结果。

如有任何帮助，我们将不胜感激。谢谢。

protected String onPhotoTaken()
{
    // lang.traineddata file with the app (in assets folder)
    // You can get them at:
    // http://code.google.com/p/tesseract-ocr/downloads/list
    // This area needs work and optimization
    boIsTaken = true;

    BitmapFactory.Options options = new BitmapFactory.Options();
    options.inSampleSize = 4;

    Bitmap bitmap = BitmapFactory.decodeFile(strTakenPicPath, options);

    try {
        ExifInterface exif = new ExifInterface(strTakenPicPath);
        int exifOrientation = exif.getAttributeInt(
                ExifInterface.TAG_ORIENTATION,
                ExifInterface.ORIENTATION_NORMAL);

        Log.v(TAG, "Orient: " + exifOrientation);

        int rotate = 0;

        switch (exifOrientation) {
        case ExifInterface.ORIENTATION_ROTATE_90:
            rotate = 90;
            break;
        case ExifInterface.ORIENTATION_ROTATE_180:
            rotate = 180;
            break;
        case ExifInterface.ORIENTATION_ROTATE_270:
            rotate = 270;
            break;
        }

        Log.v(TAG, "Rotation: " + rotate);

        if (rotate != 0) {

            // Getting width & height of the given image.
            int w = bitmap.getWidth();
            int h = bitmap.getHeight();

            // Setting pre rotate
            Matrix mtx = new Matrix();
            mtx.preRotate(rotate);

            // Rotating Bitmap
            bitmap = Bitmap.createBitmap(bitmap, 0, 0, w, h, mtx, false);
        }

        // Convert to ARGB_8888, required by tess
        bitmap = bitmap.copy(Bitmap.Config.ARGB_8888, true);

    } catch (IOException e) {
        Log.e(TAG, "Couldn't correct orientation: " + e.toString());
    }

    // _image.setImageBitmap( bitmap );

    Log.v(TAG, "Before baseApi");

    TessBaseAPI baseApi = new TessBaseAPI();
    Log.v(TAG, "initialize baseApi");
    baseApi.setDebug(true);
    //getLang() returns equ in case of equations detection  
    baseApi.init(DATA_PATH, getLang());
    Log.v(TAG, "init baseApi done");
    baseApi.setImage(bitmap);

    String recognizedText = baseApi.getUTF8Text();

    baseApi.end();

    // You now have the text in recognizedText var, you can do anything with it.
    // We will display a stripped out trimmed alpha-numeric version of it (if lang is eng)
    // so that garbage doesn't make it to the display.

    Log.v(TAG, "Detected TEXT: " + recognizedText);

    if ( getLang().equalsIgnoreCase("eng") ) {
        recognizedText = recognizedText.replaceAll("[^a-zA-Z0-9]+", " ");
    }

    recognizedText = recognizedText.trim();
    return recognizedText;

    // Cycle done.
}//end onPhotoTaken

image that contains equation And Tesseract result

最佳答案

那是因为 equ.traineddata 是******，我使用 eng.traineddata 进行数字识别。也许我们需要训练自己的 .traineddata 来检测数学方程式:S

如果您找到任何比 equ 更好的数学 .traineddata，请告诉我

关于android - 为什么 Tesseract for Android 在使用 "equ.traineddata"检测数学符号或方程时返回乱码？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/29527857/

android - 为什么 Tesseract for Android 在使用 "equ.traineddata"检测数学符号或方程时返回乱码？

上一篇：Android ListView 项目更新

下一篇：android - 无法保存微调器状态android