nlp - WordNet 3.1和WordNet 3.0之间有什么区别?

标签 nlp wordnet

在wordnet.princeton.edu上似乎没有变更日志或类似的东西

最佳答案

要添加到@abarisone的答案中,WordNet 3.0和WordNet 3.1之间实际的同义词集ID可能会有所不同:(

例如,在WordNet 3.1中,主持人103005231-n

但是,在WordNet 3.0中,它是 103001627-n 。但是您不能在http://wordnet-rdf.princeton.edu/wn31/103001627-nhttp://wordnet-rdf.princeton.edu/wn30/103001627-n中查找它,而是需要使用http://wordnet-rdf.princeton.edu/wn30/03001627-n,它会错误地重定向到102992974-n

我认为这是WordNet RDF 3.1 online app中的错误,因为102992974-n并不正式存在。您甚至都无法搜索(在线和离线)。并且,如果在该页面上获得RDF/JSON-LD文件,它将为您提供 103005231-n

wn3.1.dict/dict/index.noun中:

chair n 5 4 @ ~ %p + 5 2 03005231 00599171 10488547 03275941 03005700  

在该文件的任何地方都没有提及02992974

这两个问题都令人困惑。我想知道为什么他们在次要版本中更改了同义词集ID。

关于WordNet同义词集ID的状态:

结论是,当前,使用WordNet 3.0同义词集ID是最安全的。

为了将来的工作,可以考虑使用Global Wordnet Association的跨语言索引(即将推出)。具有与Wordnet 3.0兼容的ID。

来自wn-users mailing list, 30 Oct 2015的引用:

From: Raphael, Nicholas

The URI is built from the “dblocation” field, which is a byte offset from the beginning of the relevant character-based database file (I’m not sure which). This will change from release to release as items are removed and added and moved around.





From: Peter Clark

To the best of my knowledge…. FYI a little known fact is that the sense keys (e.g., “ability%1:07:00::”) are stable between releases, except when senses are split or merged. This provides a stable way to refer to synsets across releases, rather than use synset numbers. Also you can find the mappings between synset numbers in different releases by looking for the same sense keys. (sensekey->synset is a many-to-1 mapping: A synset may have multiple sense keys, one for each word+sense in the synset. But a sense key maps to exactly one synset). Best wishes, Pete





From: John McCrae

Hello Hendy,

Yes WordNet synset Identifiers are based on the byte offset of the descriptor in a given release of WordNet, as such they are far from stable across versions of WordNets. The sense identifiers are more stable but still can be unreliable as sense do get split and merged. Also, there are two slightly different versions of WordNet 3.1 and the WordNet RDF version accepts synset identifiers from either... this is of course, as others have commented, all very confusing.

For this reason, the Global WordNet Association has started work on an Inter-Lingual Index, which we expect to be online soon (i.e., in time for the Global WordNet Conference in January), and will give each synset a single unchanging URI.

Piek Vossen gave a good talk about this recently and this slides are online here: http://ldl2014.org/slides/Vossen-LOD-CILI.pdf

For the moment, I would recommend using WN 3.0 identifiers to link synsets, which the WordNet Interlingual Index will also be based on.

Regards, John

关于nlp - WordNet 3.1和WordNet 3.0之间有什么区别?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32423369/

相关文章:

python - 使用 "nltk.word_tokenize()"函数时出错

python - 如何使用 NLTK 构建词性标注语料库?

python - 生成名词的复数形式

Java:将英语动词转换为特定时态

java - 词网关系

machine-learning - 不同组大小的精度和召回率计算

python-3.x - 使用 Python 将特定文本替换为编辑版本

java - 让 Rita.WordNet 正常工作

wordnet - 基于 WordNet 相似度的最高分

python - 如何在 NLTK 中对二元语言模型进行单词级别的 Kneser-Ney 平滑?