java - StanleyNLP - TokensRegexNERAnnotator.readEntries 处的 ArrayIndexOutOfBoundsException(TokensRegexNERAnnotator.java :696))

我想使用 stanfordNLP 的 TokensRegexNERAnnotator 将以下内容识别为技能。

专业领域知识领域计算机技能技术经验技术技能

还有很多像上面这样的文本序列。

代码 -

    Properties props = new Properties();
    props.put("annotators", "tokenize, ssplit, pos, lemma, ner");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    pipeline.addAnnotator(new TokensRegexNERAnnotator("./mapping/test_degree.rule", true));
    String[] tests = {"Bachelor of Arts is a good degree.", "Technical Skill is a must have for Software Developer."};
    List tokens = new ArrayList<>();

    // traversing each sentence from array of sentence.
    for (String txt : tests) {
         System.out.println("String is : " + txt);

         // create an empty Annotation just with the given text
         Annotation document = new Annotation(txt);

         pipeline.annotate(document);
         List<CoreMap> sentences = document.get(SentencesAnnotation.class);

         /* Next we can go over the annotated sentences and extract the annotated words,
         Using the CoreLabel Object */
      for (CoreMap sentence : sentences) {
         for (CoreLabel token : sentence.get(TokensAnnotation.class)) {
             System.out.println("annotated coreMap sentences : " + token);
             // Extracting NER tag for current token
             String ne = token.get(NamedEntityTagAnnotation.class);
             String word = token.get(CoreAnnotations.TextAnnotation.class);
             System.out.println("Current Word : " + word + " POS :" + token.get(PartOfSpeechAnnotation.class));
             System.out.println("Lemma : " + token.get(LemmaAnnotation.class));
             System.out.println("Named Entity : " + ne);
    }
  }

我的正则表达式规则文件是 -

$SKILL_FIRST_KEYWORD = "/领域/|/领域/|/技术/|/计算机/|/专业/" $SKILL_KEYWORD =“/知识/|/技能/|/技能/|/专业知识/|/经验/”

tokens = { 类型:“CLASS”，值:“edu.stanford.nlp.ling.CoreAnnotations$TokensAnnotation” }

{ 规则类型:“ token ”，模式:($SKILL_FIRST_KEYWORD + $SKILL_KEYWORD), 结果:“技能” }

我收到 ArrayIndexOutOfBoundsException 错误。我猜我的规则文件有问题。有人可以指出我哪里出错了吗？

所需输出 -

专业领域 - 技能

知识领域 - 技能

计算机技能 - 技能

等等。

提前致谢。

最佳答案

您应该使用 TokensRegexAnnotator，而不是 TokensRegexNERAnnotator。

您应该查看这些帖子以获取更多信息:

TokensRegex rules to get correct output for Named Entities

Getting output in the desired format using TokenRegex

关于java - StanleyNLP - TokensRegexNERAnnotator.readEntries 处的 ArrayIndexOutOfBoundsException(TokensRegexNERAnnotator.java :696))，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/43691901/

java - StanleyNLP - TokensRegexNERAnnotator.readEntries 处的 ArrayIndexOutOfBoundsException(TokensRegexNERAnnotator.java :696))

上一篇：java - Spring Controller 使我的 CRUD 操作失败，对此该怎么办？

下一篇：java - cxf-rt-frontend-jaxws 的 Maven 依赖项破坏了现有的 SOAP 客户端