stanford-nlp - CoreNLP 服务器不返回实体提及

标签 stanford-nlp

here 下载了 CoreNLP 服务器和以下 these instruction ,当我包括 entitymentions作为注释者:

wget --post-data 'Mark Ronson played a concert in New York.' 'localhost:9000/?properties={"tokenize.whitespace": "true", "annotators": "tokenize,ssplit,pos,entitymentions", "outputFormat": "json"}'

返回的json如下所示,虽然ner是按 token 添加的,没有提及列表。

知道为什么吗?

(值得一提的是 corenlp.run 似乎也没有返回它们 - 似乎亮点是后处理的结果)。
{
    "sentences": [
        {
            "index": 0,
            "parse": "SENTENCE_SKIPPED_OR_UNPARSABLE",
            "tokens": [
                {
                    "index": 1,
                    "word": "Mark",
                    "originalText": "Mark",
                    "lemma": "Mark",
                    "characterOffsetBegin": 0,
                    "characterOffsetEnd": 4,
                    "pos": "NNP",
                    "ner": "PERSON"
                },
                {
                    "index": 2,
                    "word": "Ronson",
                    "originalText": "Ronson",
                    "lemma": "Ronson",
                    "characterOffsetBegin": 5,
                    "characterOffsetEnd": 11,
                    "pos": "NNP",
                    "ner": "PERSON"
                },
                {
                    "index": 3,
                    "word": "played",
                    "originalText": "played",
                    "lemma": "play",
                    "characterOffsetBegin": 12,
                    "characterOffsetEnd": 18,
                    "pos": "VBD",
                    "ner": "O"
                },
                {
                    "index": 4,
                    "word": "a",
                    "originalText": "a",
                    "lemma": "a",
                    "characterOffsetBegin": 19,
                    "characterOffsetEnd": 20,
                    "pos": "DT",
                    "ner": "O"
                },
                {
                    "index": 5,
                    "word": "concert",
                    "originalText": "concert",
                    "lemma": "concert",
                    "characterOffsetBegin": 21,
                    "characterOffsetEnd": 28,
                    "pos": "NN",
                    "ner": "O"
                },
                {
                    "index": 6,
                    "word": "in",
                    "originalText": "in",
                    "lemma": "in",
                    "characterOffsetBegin": 29,
                    "characterOffsetEnd": 31,
                    "pos": "IN",
                    "ner": "O"
                },
                {
                    "index": 7,
                    "word": "New",
                    "originalText": "New",
                    "lemma": "New",
                    "characterOffsetBegin": 32,
                    "characterOffsetEnd": 35,
                    "pos": "NNP",
                    "ner": "LOCATION"
                },
                {
                    "index": 8,
                    "word": "York.",
                    "originalText": "York.",
                    "lemma": "York.",
                    "characterOffsetBegin": 36,
                    "characterOffsetEnd": 41,
                    "pos": "NNP",
                    "ner": "LOCATION"
                }
            ]
        }
    ]
}

最佳答案

不管是好是坏,我们目前不会将实体提及输出到我们的输出器。推荐的解决方法是以与实体提及注释器相同的方式对数据进行后处理:同一 NER 的连续跨度被视为实体提及。我相信实体提及对象中的所有注释也附加到组件标记上。

关于stanford-nlp - CoreNLP 服务器不返回实体提及,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35582020/

相关文章:

nlp - 创建单词词典并映射到其他语言

java - CoreNLP 给出无法找到或加载主类错误

c# - Stanford CoreNLP 创建 edu.stanford.nlp.time.TimeExpressionExtractorImpl 时出错

java - 斯坦福 NLP 语法关系类型?

nlp - Java 中的 spaCy 替代方案

java - 如何使用 stanford-nlp 提供的 OpenIEDemo.java 生成自定义三元组

python - 如何安装和调用 Stanford NERTagger?

java - 训练斯坦福 postagger 模型

stanford-nlp - nltk stanford ner tagger 和 stanford ner tagger 在线演示之间的不一致

java - 文档中的项目符号在 GATE NLP 中变成问号