我正在与斯坦福 CoreNLP 合作,但我有一个疑问。 我想确定每个单词的语法类别以及何时在命令行中执行文本:
java -cp "*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -props StanfordCoreNLP-spanish.properties -annotators tokenize,ssplit,pos, ner -file entrada.txt -outputFormat conll
输出如下:
1 tomar _ VERB _ _ _
2 una _ DET _ _ _
3 cerveza _ NOUN _ _ _
4 en _ ADP _ _ _
5 Madrid _ PROPN _ _ _
但是当我使用以下代码从 NetBeans 执行时:
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner");
props.setProperty("tokenize.language", "es");
props.setProperty("pos.model", "edu/stanford/nlp/models/pos-tagger/spanish/spanish-distsim.tagger");
props.setProperty("ner.model", "edu/stanford/nlp/models/ner/spanish.ancora.distsim.s512.crf.ser.gz");
props.setProperty("ner.applyNumericClassifiers", "true");
props.setProperty("ner.useSUTime", "false");
props.setProperty("ner.applyFineGrained", "false");
props.setProperty("ner.language", "es");
String text = "Ver una película de miedo, pasear por un parque";
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
Annotation document = new Annotation(text);
pipeline.annotate(document);
List<CoreMap> sentences = document.get(SentencesAnnotation.class);
for(CoreMap sentence: sentences) {
for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
String g = token.tag();
String word = token.get(TextAnnotation.class);
String pos = token.get(PartOfSpeechAnnotation.class);
String ne = token.get(NamedEntityTagAnnotation.class);
String lema = token.get(LemmaAnnotation.class);
System.out.println(String.format("[%s] "
+ "[%s] "
+ "[%s] "
+ "[%s] " , word, pos, ne, lema));
}
}
输出如下:
[Ver] [vmn0000] [O] [ver]
[una] [di0000] [O] [una]
[película] [nc0s000] [O] [película]
[de] [sp000] [O] [de]
[miedo] [nc0s000] [O] [miedo]
[,] [fc] [O] [,]
[pasear] [vmn0000] [O] [pasear]
[por] [sp000] [O] [por]
[un] [di0000] [O] [un]
[parque] [nc0s000] [O] [parque]
那么,如何转换“Verb”中的“vmn0000”等标签?
提前谢谢您!!
最佳答案
确保使用斯坦福 CoreNLP 3.9.2 和词性的 UD 模型。
edu/stanford/nlp/models/pos-tagger/spanish/spanish-ud.tagger
关于java - 如何在西类牙语中使用斯坦福 NLP 词性标记?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58754796/