Java斯坦福NLP : Find word frequency?

我正在使用 Stanford NLP Parsing 工具包。给定词典中的一个词，我如何找到它的频率*？或者，给定一个频率排名，我如何确定相应的词？

*在整个语言中，而不仅仅是文本示例。

这是我正在使用的工具包的演示:

class ParserDemo {
  public static void main(String[] args) {
    LexicalizedParser lp = new LexicalizedParser("englishPCFG.ser.gz");
    lp.setOptionFlags(new String[]{"-maxLength", "80", "-retainTmpSubcategories"});

    String[] sent = { "Sincerity", "may", "frighten", "the", "boy", "." };
    Tree parse = (Tree) lp.apply(Arrays.asList(sent));
    parse.pennPrint();
    System.out.println();

    TreebankLanguagePack tlp = new PennTreebankLanguagePack();
    GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
    GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
    Collection tdl = gs.typedDependenciesCollapsed();
    System.out.println(tdl);
    System.out.println();

    TreePrint tp = new TreePrint("penn,typedDependenciesCollapsed");
    tp.printTree(parse);
  }

}

最佳答案

如果你只是计算词频，句子解析是不必要的。您需要做的就是对输入进行标记，然后使用 java HashMap 计算词频。如果您想使用 Stanford 工具，请使用 edu.stanford.nlp.process 中的任何标记器。

这会为您提供任何给定单词的频率，但通常可能无法找到与给定频率等级相对应的单词，因为某些单词在文档中的出现频率可能相同。

关于Java斯坦福NLP : Find word frequency?，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/1816800/

Java斯坦福NLP : Find word frequency?

上一篇：java - svn 导致 eclipse 中的构建错误

下一篇：java - 将 Resteasy 与 javassist 一起使用？