c++ - OpenALPR 退出并出现带有自定义训练数据的段错误

标签 c++ ocr tesseract

我正在使用 OpenALPR,我已经训练 OCR 识别强制字体。当我尝试使用经过训练的数据时,alpr 退出并出现段错误。

我使用的是 1.2.0 版和 tesseract 3.03,以及 leptonica-1.71

当我用 gdb 运行它时,我得到以下堆栈跟踪:

(gdb) bt
#0  0x00007ffff67ded7b in tesseract::Classify::ComputeCharNormArrays(FEATURE_STRUCT*, INT_TEMPLATES_STRUCT*, unsigned char*, unsigned char*) () from /usr/local/lib/libtesseract.so.3
#1  0x00007ffff67e3b6b in tesseract::Classify::CharNormTrainingSample(bool, int, tesseract::TrainingSample const&, GenericVector<tesseract::UnicharRating>*) () from /usr/local/lib/libtesseract.so.3
#2  0x00007ffff6806882 in tesseract::TessClassifier::UnicharClassifySample(tesseract::TrainingSample const&, Pix*, int, int, GenericVector<tesseract::UnicharRating>*) () from /usr/local/lib/libtesseract.so.3
#3  0x00007ffff67e1b22 in tesseract::Classify::CharNormClassifier(TBLOB*, tesseract::TrainingSample const&, ADAPT_RESULTS*) () from /usr/local/lib/libtesseract.so.3
#4  0x00007ffff67e1c95 in tesseract::Classify::DoAdaptiveMatch(TBLOB*, ADAPT_RESULTS*) () from /usr/local/lib/libtesseract.so.3
#5  0x00007ffff67e1f24 in tesseract::Classify::AdaptiveClassifier(TBLOB*, BLOB_CHOICE_LIST*) () from /usr/local/lib/libtesseract.so.3
#6  0x00007ffff67d993d in tesseract::Wordrec::call_matcher(TBLOB*) () from /usr/local/lib/libtesseract.so.3
#7  0x00007ffff67d9986 in tesseract::Wordrec::classify_blob(TBLOB*, char const*, C_COL, BlamerBundle*) () from /usr/local/lib/libtesseract.so.3
#8  0x00007ffff67d6bab in tesseract::Wordrec::classify_piece(GenericVector<SEAM*> const&, short, short, char const*, TWERD*, BlamerBundle*) () from /usr/local/lib/libtesseract.so.3
#9  0x00007ffff67c808d in tesseract::Wordrec::chop_word_main(WERD_RES*) () from /usr/local/lib/libtesseract.so.3
#10 0x00007ffff67d9821 in tesseract::Wordrec::cc_recog(WERD_RES*) () from /usr/local/lib/libtesseract.so.3
#11 0x00007ffff6716e62 in tesseract::Tesseract::recog_word_recursive(WERD_RES*) () from /usr/local/lib/libtesseract.so.3
#12 0x00007ffff6716ff5 in tesseract::Tesseract::recog_word(WERD_RES*) () from /usr/local/lib/libtesseract.so.3
#13 0x00007ffff6708160 in tesseract::Tesseract::tess_segment_pass_n(int, WERD_RES*) () from /usr/local/lib/libtesseract.so.3
#14 0x00007ffff66cf2a5 in tesseract::Tesseract::match_word_pass_n(int, WERD_RES*, ROW*, BLOCK*) () from /usr/local/lib/libtesseract.so.3
#15 0x00007ffff66cf482 in tesseract::Tesseract::classify_word_pass1(tesseract::WordData*, WERD_RES*) () from /usr/local/lib/libtesseract.so.3
#16 0x00007ffff66d26d6 in tesseract::Tesseract::classify_word_and_language(void (tesseract::Tesseract::*)(tesseract::WordData*, WERD_RES*), tesseract::WordData*) () from /usr/local/lib/libtesseract.so.3
#17 0x00007ffff66d2dea in tesseract::Tesseract::RecogAllWordsPassN(int, ETEXT_DESC*, GenericVector<tesseract::WordData>*) () from /usr/local/lib/libtesseract.so.3
#18 0x00007ffff66d3701 in tesseract::Tesseract::recog_all_words(PAGE_RES*, ETEXT_DESC*, TBOX const*, char const*, int) () from /usr/local/lib/libtesseract.so.3
#19 0x00007ffff66c203d in tesseract::TessBaseAPI::Recognize(ETEXT_DESC*) () from /usr/local/lib/libtesseract.so.3
#20 0x00000000004aadf4 in OCR::performOCR (this=0x93d7c0, pipeline_data=0x7fffe0a3a9b0) at /opt/openalpr/src/openalpr/ocr.cpp:79
#21 0x000000000048845b in plateAnalysisThread (arg=0x7fffffffd380) at /opt/openalpr/src/openalpr/alpr_impl.cpp:261
#22 0x00000000004df217 in tthread::thread::wrapper_function (aArg=0x93d210) at /opt/openalpr/src/openalpr/support/tinythread.cpp:169
#23 0x00007ffff6401182 in start_thread (arg=0x7fffe0a3b700) at pthread_create.c:312
#24 0x00007ffff590e00d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

最佳答案

我在调试 tesseract 时看到错误发生是因为 shape_table_->getShape(id) 被调用时 id >= to shape_table_ 的大小,在文件 adaptmatch.cpp 上。

作为解决方法,我更改了代码以首先检查大小并跳过迭代而不是使用段错误退出。

此解决方法可能会产生不良后果,但至少它停止了中断。这是差异:

diff --git a/classify/adaptmatch.cpp b/classify/adaptmatch.cpp
index 0eaf144..b21d980 100644
--- a/classify/adaptmatch.cpp
+++ b/classify/adaptmatch.cpp
@@ -1148,7 +1148,7 @@ void Classify::ExpandShapesAndApplyCorrections(
     fontinfo_id = ClassAndConfigIDToFontOrShapeID(class_id, int_result.Config);
     fontinfo_id2 = ClassAndConfigIDToFontOrShapeID(class_id,
                                                    int_result.Config2);
-    if (shape_table_ != NULL) {
+    if (shape_table_ != NULL && fontinfo_id < shape_table_->NumShapes()) {
       // Actually fontinfo_id is an index into the shape_table_ and it
       // contains a list of unchar_id/font_id pairs.
       int shape_id = fontinfo_id;
@@ -1781,10 +1781,12 @@ void Classify::ComputeCharNormArrays(FEATURE_STRUCT* norm_feature,
         int font_set_id = templates->Class[id]->font_set_id;
         const FontSet &fs = fontset_table_.get(font_set_id);
         for (int config = 0; config < fs.size; ++config) {
-          const Shape& shape = shape_table_->GetShape(fs.configs[config]);
-          for (int c = 0; c < shape.size(); ++c) {
-            if (char_norm_array[shape[c].unichar_id] < pruner_array[id])
-              pruner_array[id] = char_norm_array[shape[c].unichar_id];
+          if (shape_table_->NumShapes() > fs.configs[config]) {
+            const Shape shape = shape_table_->GetShape(fs.configs[config]);
+            for (int c = 0; c < shape.size(); ++c) {
+              if (char_norm_array[shape[c].unichar_id] < pruner_array[id])
+                pruner_array[id] = char_norm_array[shape[c].unichar_id];
+            }
           }
         }
       }

关于c++ - OpenALPR 退出并出现带有自定义训练数据的段错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27950157/

相关文章:

c++ - 对话框资源中的语法错误

open-source - 最准确的日语开源 OCR?

c++ - 实现纯虚函数:时出现一些错误

ocr - 土耳其语版 Tesseract OCR 多维数据集文件

python - 导入错误: cannot import name 'model_urls' from 'torchvision.models.vgg'

javascript - Node.js 比使用 Tesseract.Js 的浏览器 (Safari) 慢 20 倍

r - 用 R 做 OCR

java - Tesseract OCR无法正常工作,如何更准确?

使用构造函数填充 vector 的 C++ 奇怪行为

c++ - 从进程句柄获取进程信息