nlp - 使用上下文来改进词性标注

标签 nlp

是否有一些常用或推荐的技术来使用词的上下文来提高词性标注的准确性?

例如,如果我有一句话:

I played golf on a links.



“链接”一词可以是单数(高尔夫球场)或复数。我在几个语法检查器中尝试了这个句子,他们都正确地识别出这个句子是有效的。

问题是他们也认为这句话是对的:

I clicked on a links.



是否有一种使用上下文(点击与打高尔夫球)来推断正确的词性的好方法?

谢谢!

最佳答案

确定“链接”是“高尔夫球场”还是“引用资料”是一项称为词义消歧的任务。
这是维基百科关于 Word-sense disambiguation 的文章谈到与词性标注的关系:

In any real test, part-of-speech tagging and sense tagging are very closely related with each potentially making constraints to the other. And the question whether these tasks should be kept together or decoupled is still not unanimously resolved, but recently scientists incline to test these things separately (e.g. in the Senseval/SemEval competitions parts of speech are provided as input for the text to disambiguate). It is instructive to compare the word sense disambiguation problem with the problem of part-of-speech tagging. Both involve disambiguating or tagging with words, be it with senses or parts of speech. However, algorithms used for one do not tend to work well for the other, mainly because the part of speech of a word is primarily determined by the immediately adjacent one to three words, whereas the sense of a word may be determined by words further away. The success rate for part-of-speech tagging algorithms is at present much higher than that for WSD, state-of-the art being around 95% accuracy or better, as compared to less than 75% accuracy in word sense disambiguation with supervised learning. These figures are typical for English, and may be very different from those for other languages.



我不知道使用 WSD 来通知 POS 标签的作品(但是,使用 POS 标签来通知 WSD 是标准。)这对我来说听起来是个好主意,即使对准确性的好处很小,因为准确性已经高的。它可以作为 Toutanova 的 CRF 标记器中的一个功能来实现。

关于nlp - 使用上下文来改进词性标注,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8947701/

相关文章:

nlp - 主题模型可以用于小文本吗?

text - 自然语言生成 - 如何超越模板

python - 实体检测 - 与英文单词冲突的实体

php - 从多个句子中选择或生成规范变体

python-2.7 - 使用 nltk 对法语进行标记

python - 使用 Pandas 和 spaCy 提取句子嵌入特征

algorithm - 哪种字符串距离算法最适合测量打字准确度?

nlp - Stanford coreNLP - 忽略撇号的拆分词

java - 使用 Lucene 和 Java 标记化、删除停用词

nlp - Parsey McParseface 错误地识别问题的根源