nlp - 如何在 Stanford CoreNLP 中获取短语标签？

如果我想获取每个单词对应的短语标签，我该如何获取？

例如:

在这句话中，

My dog also likes eating sausage.

我可以在 Stanford NLP 中得到一个解析树，例如

(ROOT (S (NP (PRP$ My) (NN dog)) (ADVP (RB also)) (VP (VBZ likes) (NP (JJ eating) (NN sausage))) (. .)))

在上面的情况下，我想得到对应每个单词的短语标签

(My - NP), (dog - NP), (also - ADVP), (likes - VP), ...

有什么简单的词组标签提取方法吗？

请帮帮我。

最佳答案

//I guess this is how you get your parse tree.
Tree tree = sentAnno.get(TreeAnnotation.class);

//The children of a Tree annotation is an array of trees.
Tree[] children = parent.children() 

//Check the label of any sub tree to see whether it is what you want (a phrase)
for (Tree child: children){
   if (child.value().equals("NP")){// set your rule of defining Phrase here
          List<Tree> leaves = child.getLeaves(); //leaves correspond to the tokens
          for (Tree leaf : leaves){ 
            List<Word> words = leaf.yieldWords();
            for (Word word: words)
                System.out.print(String.format("(%s - NP),",word.word()));
          }
   }
}

该代码未经过全面测试，但我认为它大致可以满足您的需求。更重要的是，我没有写任何关于递归访问子树的内容，但我相信你应该能够做到这一点。

关于nlp - 如何在 Stanford CoreNLP 中获取短语标签？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/14373557/

上一篇：flash - 在 flash 元素上使用 translate3d 的 Firefox 错误

下一篇：gdb - 如何使用寄存器和原始堆栈分析和调试没有符号的 gdb 核心

相关文章：

java - 如何在 java 文档中查找短语(多个标记字符串)的频率？

java - stanford nlp 共指解析错误 : Exception in thread "main" java. lang.IllegalArgumentException : File doesn't exist: example_file. txt

R 文本挖掘 : Counting the number of times a specific word appears in a corpus?

使用 Solr 搜索和匹配短语的计数

algorithm - 是否存在一种算法来帮助检测英语句子的 "primary topic"？

nlp - 什么是远程监管？

pattern-matching - 自然语言理解算法

python - 名称实体解析算法

image-processing - 阿拉伯语开源 OCR 库

java - 使用 OpenNLP 获取句子的解析树。陷入困境的例子。