javascript - 在 tesseract.js 中加载离线 lang 数据

标签 javascript tesseract.js

我正在尝试将我自己训练的数据加载到 tesseract.js。由于文件放在本地,我尝试离线加载所有内容。我使用的代码如下所示:

<script src="tesseract.js"></script>

<script>
//Set the worker, core and lang to local files
(function() {
var path = (function() { //absolute path
    var pathArray = window.location.pathname.split( '/' );
    pathArray.pop(); //Remove the last ("**.html")
    return window.location.origin + pathArray.join("/");
})();
console.log(path);

window.Tesseract = Tesseract.create({
    workerPath: path + '/worker.js',
    //langPath: path + '/traineddata/',
    corePath: path + '/index.js',
});
})();
</script>

<script>
function recognizeFile(file){
    document.querySelector("#log").innerHTML = ''

    Tesseract.recognize(file, {
        lang: document.querySelector('#langsel').value
    })
        .progress(function(packet){
            console.info(packet)
            progressUpdate(packet)

        })
        .then(function(data){
            console.log(data)
            progressUpdate({ status: 'done', data: data })
        })
}
</script>

如果未设置 langPath,上面的代码可以正常工作,但是当我将 langPath 指向本地文件夹时,Tesseract 无法加载任何内容并出现以下错误:

Failed loading language 'eng'
Tesseract couldn't load any languages!

...

AdaptedTemplates != NULL:Error:Assert failed:in file ../classify/adaptmatch.cpp, line 190
SCRIPT0: abort() at Error
   at Na (file:///C:/Users/user/Downloads/tesseract.js-master/dist/index.js:36:24)
   at ka (file:///C:/Users/user/Downloads/tesseract.js-master/dist/index.js:511:83)
   at Module.de._abort (file:///C:/Users/user/Downloads/tesseract.js-master/dist/index.js:377:166)
   at $L (file:///C:/Users/user/Downloads/tesseract.js-master/dist/index.js:387:55709)
   at jpa (file:///C:/Users/user/Downloads/tesseract.js-master/dist/index.js:392:22274)
   at lT (file:///C:/Users/user/Downloads/tesseract.js-master/dist/index.js:391:80568)
   at mT (file:///C:/Users/user/Downloads/tesseract.js-master/dist/index.js:391:80698)
   at BS (file:///C:/Users/user/Downloads/tesseract.js-master/dist/index.js:391:69009)
   at bP (file:///C:/Users/user/Downloads/tesseract.js-master/dist/index.js:387:110094)
   at jT (file:///C:/Users/user/Downloads/tesseract.js-master/dist/index.js:391:80280)
   at RJ (file:///C:/Users/user/Downloads/tesseract.js-master/dist/index.js:387:19088)
   at QJ (file:///C:/Users/user/Downloads/tesseract.js-master/dist/index.js:387:17789)
   at zI (file:///C:/Users/user/Downloads/tesseract.js-master/dist/index.js:403:90852)
   at tw (file:///C:/Users/user/Downloads/tesseract.js-master/dist/index.js:401:49079)
   at rw (file:///C:/Users/user/Downloads/tesseract.js-master/dist/index.js:401:48155)
   at lw (file:///C:/Users/user/Downloads/tesseract.js-master/dist/index.js:401:39071)
   at _v (file:///C:/Users/user/Downloads/tesseract.js-master/dist/index.js:401:22565)
   at aw (file:///C:/Users/user/Downloads/tesseract.js-master/dist/index.js:401:24925)
   at cw (file:///C:/Users/user/Downloads/tesseract.js-master/dist/index.js:401:27237)
   at oj (file:///C:/Users/user/Downloads/tesseract.js-master/dist/index.js:386:24689)
   at Og (file:///C:/Users/user/Downloads/tesseract.js-master/dist/index.js:386:10421)
   at $.prototype.Recognize (file:///C:/Users/user/Downloads/tesseract.js-master/dist/index.js:558:379)
   at Anonymous function (file:///C:/Users/user/Downloads/tesseract.js-master/dist/worker.js:8814:9)
   at Anonymous function (file:///C:/Users/user/Downloads/tesseract.js-master/dist/worker.js:8786:9)
   at xhr.onerror (file:///C:/Users/user/Downloads/tesseract.js-master/dist/worker.js:8429:9)
If this abort() is unexpected, build with -s ASSERTIONS=1 which can give more information.
index.js (8,1)

我在/traineddata 文件夹中同时拥有 eng.traineddata 和 eng.traineddata.gz,因为显然跳过了 ungzip 过程。有什么我忽略的吗?任何帮助表示赞赏。

最佳答案

我知道这个问题很老,但最近我需要在我的一个项目中使用 Tesseract.js。我需要在本地加载数据文件,所以这就是我所做的。

而不是创建一个新的 worker 。我修改了可用的默认工作人员选项。所以我没有使用 Tesseract.createWorker 而是直接设置路径并使用 recognize 代替。

 Tesseract.workerOptions.langPath = 
           window.location.origin // take protocol://domain.com part
           + "/scripts/tesseract/dist/"; // location of data files

 //you could set core and worker paths too but I didn't need it
 Tesseract.workerOptions.workerPath = 
           window.location.origin // take protocol://domain.com part
           + "/scripts/tesseract/dist/worker.js"; // location of worker.js

 //you could set core and worker paths too but I didn't need it
 Tesseract.workerOptions.corePath = 
           window.location.origin // take protocol://domain.com part
           + "/scripts/tesseract/dist/index.js"; // location of index.js

//example lang path would be protocol://domain.com/scripts/tesseract/dist/

通过这样做,我让工作路径和核心路径保持不变,指向默认 CDN。

PS:当使用本地 worker.js 和 core.js 路径时,我在 worker.js 中的 postMessage() 上遇到 Uncaught Error 。这就是为什么我只对 langData 使用本地路径。我仍然不知道如何解决它或为什么会发生。但是,你可以关注这个问题 herehere

关于javascript - 在 tesseract.js 中加载离线 lang 数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44450010/

相关文章:

javascript - 如何使用 if else 条件替换字符串

javascript - 如何通过 AJAX 调用将输入字段值发送到 Node JS 后端以实现预输入功能

javascript - 如何在关闭时清除 Bootstrap 远程模式内容?

node.js - 从 PAN 卡读取文本

javascript - Tesseract js 错误

javascript - Chrome 版本 57.0.2987 中的 chrome.runtime.sendMessage 错误

javascript - 如何继承Polymer中的变量?