multithreading - 如何在 Node.js 应用程序中使用 Apache OpenNLP

标签 multithreading node.js apache architecture opennlp

将 Apache Open NLP 与 Node.js 结合使用的最佳方式是什么?

具体来说,我想使用名称实体提取 API。这是关于它的说法 - 文档很糟糕(我认为是新项目):

http://opennlp.apache.org/documentation/manual/opennlp.html#tools.namefind

来自文档:

To use the Name Finder in a production system its strongly recommended to embed it directly into the application instead of using the command line interface. First the name finder model must be loaded into memory from disk or an other source. In the sample below its loaded from disk.

InputStream modelIn = new FileInputStream("en-ner-person.bin");

try {
  TokenNameFinderModel model = new TokenNameFinderModel(modelIn);
}
catch (IOException e) {
  e.printStackTrace();
}
finally {
  if (modelIn != null) {
    try {
      modelIn.close();
    }
    catch (IOException e) {
    }
  }
}

There is a number of reasons why the model loading can fail:

Issues with the underlying I/O

The version of the model is not compatible with the OpenNLP version

The model is loaded into the wrong component, for example a tokenizer model is loaded with TokenNameFinderModel class.

The model content is not valid for some other reason

After the model is loaded the NameFinderME can be instantiated.

NameFinderME nameFinder = new NameFinderME(model);

The initialization is now finished and the Name Finder can be used. The NameFinderME class is not thread safe, it must only be called from one thread. To use multiple threads multiple NameFinderME instances sharing the same model instance can be created. The input text should be segmented into documents, sentences and tokens. To perform entity detection an application calls the find method for every sentence in the document. After every document clearAdaptiveData must be called to clear the adaptive data in the feature generators. Not calling clearAdaptiveData can lead to a sharp drop in the detection rate after a few documents. The following code illustrates that:

for (String document[][] : documents) {

  for (String[] sentence : document) {
    Span nameSpans[] = find(sentence);
    // do something with the names
  }

  nameFinder.clearAdaptiveData()
}

the following snippet shows a call to find


String sentence = new String[]{
    "Pierre",
    "Vinken",
    "is",
    "61",
    "years"
    "old",
    "."
    };

Span nameSpans[] = nameFinder.find(sentence);

The nameSpans arrays contains now exactly one Span which marks the name Pierre Vinken. The elements between the begin and end offsets are the name tokens. In this case the begin offset is 0 and the end offset is 2. The Span object also knows the type of the entity. In this case its person (defined by the model). It can be retrieved with a call to Span.getType(). Additionally to the statistical Name Finder, OpenNLP also offers a dictionary and a regular expression name finder implementation.

最佳答案

查看这个 NodeJS 库。
https://github.com/mbejda/Node-OpenNLP
https://www.npmjs.com/package/opennlp

只需NPM install opennlp

并查看 Github 上的示例。

var nameFinder = new openNLP().nameFinder;
nameFinder.find(sentence, function(err, results) {
    console.log(results)
});

关于multithreading - 如何在 Node.js 应用程序中使用 Apache OpenNLP,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24353186/

相关文章:

java - Glassfish 异步仅使用 4 个核心中的 2 个。我如何告诉它使用全部 4 个核心?或者至少3个?

javascript - 为什么 catch block 没有触发并且应用程序在 Node.js 中停止工作

node.js - 如何获取进程列表

javascript - 创建 React 应用程序后,在 cmd 上 npm start 总是给出错误

php - 只有主页可以在本地主机版本的 Wordpress 上运行

apache - 使用通配符通过 .htaccess 重定向

c++ - 需要帮助理解从 B.Stroustrup 的新书中摘录的这段文字

c++ - 我如何跨线程同步变量? C++

java - 没有 static 的 volatile 关键字无法按预期工作

每个页面上的 PHP 函数导致重定向循环