nlp - 如何调整 neuralcoref 以获得更好的共指结果？

我正在使用 neuralcoref - 一个基于 spaCy 解析器的共指解析模块。 GIT https://github.com/huggingface/neuralcoref

但是，我得到的结果可以改进。 huggingface(neuralcoref 的开发者)提供的在线可视化工具为我提供了更准确的结果。

我正在分析的文本: "伦敦是英格兰和英国的首都和人口最多的城市。它位于大不列颠岛东南部的泰晤士河畔，两千年来一直是主要的定居点。"

我得到这个结果:

doc._.coref_resolved

London is the capital and most populous city of England and the United Kingdom. Standing on the River Thames in the south east of the island of Great Britain, the River Thames has been a major settlement for two millennia.

所以它错误地将伦敦与泰晤士河联系起来。 (它 -> 泰晤士河)

neuralcoref 在线可视化工具返回正确的链接(it -> London)

https://huggingface.co/coref/?text=London%20is%20the%20capital%20and%20most%20populous%20city%20of%20England%20and%20the%20United%20Kingdom.%20Standing%20on%20the%20River%20Thames%20in%20the%20south%20east%20of%20the%20island%20of%20Great%20Britain%2C%20it%20has%20been%20a%20major%20settlement%20for%20two%20millennia.%20It%20was%20founded%20by%20the%20Romans%2C%20who%20named%20it%20Londinium .

我已经尝试过调整参数，例如项目的 git 页面上提到的贪婪度、max_dist https://github.com/huggingface/neuralcoref

import spacy
nlp = spacy.load('en_core_web_lg')

import neuralcoref
neuralcoref.add_to_pipe(nlp,greedyness=0.5,store_scores=True)

text = "London is the capital and most populous city of England and   the United Kingdom. Standing on the River Thames in the south east of the island of Great Britain, it has been a major settlement for two millennia."# It was founded by the Romans, who named it Londinium."

doc = nlp(text)
print(doc._.coref_resolved)
doc._.coref_scores

有没有办法调整它以获得与可视化工具类似的结果？

谢谢!

最佳答案

不知道为什么在线工具会更好用(在线可视化工具不适合我)。

你可以做的是添加一个对话库:

coref = neuralcoref.NeuralCoref(nlp.vocab, conv_dict={'London': ['city', 'settlement']})

https://reposhub.com/python/deep-learning/huggingface-neuralcoref.html

我有自己的问题，对荷兰语根本不起作用..

关于nlp - 如何调整 neuralcoref 以获得更好的共指结果？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56710356/

nlp - 如何调整 neuralcoref 以获得更好的共指结果？

上一篇：python-3.x - 如何在傅立叶域中实现长信号的 Pytorch 一维互相关？

下一篇：pandas - Clickhouse 不返回列标题