我正在处理一个简单的句子来测试斯坦福大学的RelationExtractor:
Microsoft is based in New York.
(不是)
当我用 Java 注释句子时,通过直接使用 CoreNLP jar 文件,我得到了想要的结果 - CoreNLP 找到了 Microsoft 和 之间的 OrgBased_In 关系>纽约。
for (CoreMap sentence : sentences) {
relationType = sentence.get(MachineReadingAnnotations.RelationMentionsAnnotation.class).get(0).type // => OrgBased_In
}
但是,将同一句话发送到CoreNLP Server像这样:
curl --data 'Microsoft is based in New York.' 'http://localhost:9000/?properties={%22annotators%22%3A%22tokenize%2Cssplit%2Cpos%2Clemma%2Cner%2Cparse%2Cdepparse%2Crelation%22%2C%22outputFormat%22%3A%22json%22}' -o -
导致 json 响应不包含任何关系数据:
{'sentences': [{'basicDependencies': [{'dep': 'ROOT',
'dependent': 3,
'dependentGloss': 'based',
'governor': 0,
'governorGloss': 'ROOT'},
{'dep': 'nsubjpass',
'dependent': 1,
'dependentGloss': 'Microsoft',
'governor': 3,
'governorGloss': 'based'},
{'dep': 'auxpass',
'dependent': 2,
'dependentGloss': 'is',
'governor': 3,
'governorGloss': 'based'},
{'dep': 'case',
'dependent': 4,
'dependentGloss': 'in',
'governor': 6,
'governorGloss': 'York'},
{'dep': 'compound',
'dependent': 5,
'dependentGloss': 'New',
'governor': 6,
'governorGloss': 'York'},
{'dep': 'nmod',
'dependent': 6,
'dependentGloss': 'York',
'governor': 3,
'governorGloss': 'based'},
{'dep': 'punct',
'dependent': 7,
'dependentGloss': '.',
'governor': 3,
'governorGloss': 'based'}],
'enhancedDependencies': [{'dep': 'ROOT',
'dependent': 3,
'dependentGloss': 'based',
'governor': 0,
'governorGloss': 'ROOT'},
{'dep': 'nsubjpass',
'dependent': 1,
'dependentGloss': 'Microsoft',
'governor': 3,
'governorGloss': 'based'},
{'dep': 'auxpass',
'dependent': 2,
'dependentGloss': 'is',
'governor': 3,
'governorGloss': 'based'},
{'dep': 'case',
'dependent': 4,
'dependentGloss': 'in',
'governor': 6,
'governorGloss': 'York'},
{'dep': 'compound',
'dependent': 5,
'dependentGloss': 'New',
'governor': 6,
'governorGloss': 'York'},
{'dep': 'nmod:in',
'dependent': 6,
'dependentGloss': 'York',
'governor': 3,
'governorGloss': 'based'},
{'dep': 'punct',
'dependent': 7,
'dependentGloss': '.',
'governor': 3,
'governorGloss': 'based'}],
'enhancedPlusPlusDependencies': [{'dep': 'ROOT',
'dependent': 3,
'dependentGloss': 'based',
'governor': 0,
'governorGloss': 'ROOT'},
{'dep': 'nsubjpass',
'dependent': 1,
'dependentGloss': 'Microsoft',
'governor': 3,
'governorGloss': 'based'},
{'dep': 'auxpass',
'dependent': 2,
'dependentGloss': 'is',
'governor': 3,
'governorGloss': 'based'},
{'dep': 'case',
'dependent': 4,
'dependentGloss': 'in',
'governor': 6,
'governorGloss': 'York'},
{'dep': 'compound',
'dependent': 5,
'dependentGloss': 'New',
'governor': 6,
'governorGloss': 'York'},
{'dep': 'nmod:in',
'dependent': 6,
'dependentGloss': 'York',
'governor': 3,
'governorGloss': 'based'},
{'dep': 'punct',
'dependent': 7,
'dependentGloss': '.',
'governor': 3,
'governorGloss': 'based'}],
'index': 0,
'parse': '(ROOT\n'
' (S\n'
' (NP (NNP Microsoft))\n'
' (VP (VBZ is)\n'
' (VP (VBN based)\n'
' (PP (IN in)\n'
' (NP (NNP New) (NNP York)))))\n'
' (. .)))',
'tokens': [{'after': ' ',
'before': '',
'characterOffsetBegin': 0,
'characterOffsetEnd': 9,
'index': 1,
'lemma': 'Microsoft',
'ner': 'ORGANIZATION',
'originalText': 'Microsoft',
'pos': 'NNP',
'word': 'Microsoft'},
{'after': ' ',
'before': ' ',
'characterOffsetBegin': 10,
'characterOffsetEnd': 12,
'index': 2,
'lemma': 'be',
'ner': 'O',
'originalText': 'is',
'pos': 'VBZ',
'word': 'is'},
{'after': ' ',
'before': ' ',
'characterOffsetBegin': 13,
'characterOffsetEnd': 18,
'index': 3,
'lemma': 'base',
'ner': 'O',
'originalText': 'based',
'pos': 'VBN',
'word': 'based'},
{'after': ' ',
'before': ' ',
'characterOffsetBegin': 19,
'characterOffsetEnd': 21,
'index': 4,
'lemma': 'in',
'ner': 'O',
'originalText': 'in',
'pos': 'IN',
'word': 'in'},
{'after': ' ',
'before': ' ',
'characterOffsetBegin': 22,
'characterOffsetEnd': 25,
'index': 5,
'lemma': 'New',
'ner': 'LOCATION',
'originalText': 'New',
'pos': 'NNP',
'word': 'New'},
{'after': '',
'before': ' ',
'characterOffsetBegin': 26,
'characterOffsetEnd': 30,
'index': 6,
'lemma': 'York',
'ner': 'LOCATION',
'originalText': 'York',
'pos': 'NNP',
'word': 'York'},
{'after': '',
'before': '',
'characterOffsetBegin': 30,
'characterOffsetEnd': 31,
'index': 7,
'lemma': '.',
'ner': 'O',
'originalText': '.',
'pos': '.',
'word': '.'}]}]}
我可以在 CoreNLP 服务器终端上看到关系提取模型已加载。
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.RelationExtractorAnnotator - Loading relation model from edu/stanford/nlp/models/supervised_relation_extractor/roth_relation_model_pipelineNER.ser
我在这里缺少什么?
谢谢!
最佳答案
我认为最终没有人将该输出添加到该注释器的 JSON 中,我们最终可以做到这一点。
目前我们主要支持的关系提取是新的kbp
注释器。这从 TAC-KBP 挑战中提取了关系。
您可以在此处找到关系描述: https://tac.nist.gov//2015/KBP/ColdStart/guidelines/TAC_KBP_2015_Slot_Descriptions_V1.0.pdf
这是我运行的示例命令:
java -Xmx8g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,parse,mention,entitymentions,coref,kbp -file microsoft-example.txt -outputFormat json
如果您查看 JSON,您会发现正确的关系已被提取。
关于nlp - 斯坦福 CoreNLP 服务器的 JSON 响应缺少 RelationExtractor 注释,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41679542/