java - 提取中心名词

标签 java stanford-nlp

我想知道如何提取中心名词?我使用了一个不起作用的选区解析器,但我想我必须使用依赖解析器。我运行了这个演示代码,但它给了我一个错误的答案。

public class dependencydemo {
  public static void main(String[] args) throws IOException {
    PrintWriter out;
    if (args.length > 1) {
      out = new PrintWriter(args[1]);
    } else {
      out = new PrintWriter(System.out);
    }



    StanfordCoreNLP pipeline = new StanfordCoreNLP();
    Annotation annotation;
    if (args.length > 0) {
      annotation = new       ` 
 Annotation(IOUtils.slurpFileNoExceptions(args[0]));`
    } else {
      annotation = new Annotation("Yesterday, I went to the Dallas `Country Club to play 25 cent Bingo.  While I was there I talked to my `friend Jim and we both agree that those people in Washington are `destroying our economy.");`
    }

    pipeline.annotate(annotation);
    pipeline.prettyPrint(annotation, out);


    List<CoreMap> sentences = `annotation.get(CoreAnnotations.SentencesAnnotation.class);`
    if (sentences != null && sentences.size() > 0) {
      CoreMap sentence = sentences.get(0);
      Tree tree = `sentence.get(TreeCoreAnnotations.TreeAnnotation.class);`
     // out.println();
    //  out.println("The first sentence parsed is:");
      tree.pennPrint(out);
    }
   }   

输出:

(ROOT
  (S
    (NP-TMP (NN Yesterday))
    (, ,)
    (NP (PRP I))
    (VP (VBD went)
      (PP (TO to)
        (NP (DT the) (NNP Dallas) (NNP Country) (NNP Club)))
      (S
        (VP (TO to)
          (VP (VB play)
            (S
              (NP (CD 25) (NN cent))
              (NP (NNP Bingo)))))))
    (. .)))

依赖关系:

root(ROOT-0, went-4)
tmod(went-4, Yesterday-1)
nsubj(went-4, I-3)
det(Club-9, the-6)
nn(Club-9, Dallas-7)
nn(Club-9, Country-8)
prep_to(went-4, Club-9)
aux(play-11, to-10)
xcomp(went-4, play-11)
num(cent-13, 25-12)
nsubj(Bingo-14, cent-13)
xcomp(play-11, Bingo-14)

如何从中提取中心名词?除此之外,输出似乎不正确。

最佳答案

根据您在评论中的解释,我的印象是您想要所有名词短语的中心成分。使用 CoreNLP 可以很容易地做到这一点。

  1. 首先,找到所有名词短语。您可以使用简单的 Tregex 模式来完成此操作(请参阅 Chris Manning's relevant answer )。
  2. 您可以使用 CoreNLP“中心查找器”来选择匹配名词短语的句法中心成分。参见例如ModCollinsHeadFinder .

演示代码如下。

// Fetch a head finder.
HeadFinder hf = new PennTreebankLanguagePack().headFinder();

Tree myTree = ...
TregexPattern tPattern = TregexPattern.compile("NP");
TregexMatcher tMatcher = tPattern.matcher(myTree);
while (tMatcher.find()) {
  Tree nounPhrase = tMatcher.getMatch();

  Tree headConstituent = hf.determineHead(nounPhrase);
  System.out.println(headConstituent);
}

关于java - 提取中心名词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29265488/

相关文章:

java - 是否可以在 HTTPS 连接器上重写 HTTP URL?

java - EJB中的代理对象

nlp - 什么是两级形态学?

postgresql - 为(斯坦福)Deepdive 准备数据(ValueError)

stanford-nlp - Stanford Parser的标签

nlp - stanford-nlp token 列表中的 NER

java - 如何使用斯坦福解析器将文本拆分为句子?

java - 增加并发hashmap分段

java - 如何在不刷新的情况下关闭 BufferedWriter?

javascript函数转java