nlp - 重建现在著名的 17 岁男孩基于马尔可夫链的信息检索算法 "Apodora"

当我们都在百思不得其解时，一名 17 岁的加拿大男孩显然发现了一种信息检索算法:

a) 的执行精度是当前广泛使用的向量空间模型的两倍

b) 在识别相似单词方面“相当准确”。

c) 使微搜索更加准确

这是一个很好的interview .

不幸的是，我还没有找到已发表的论文，但是，从我几年前参加的图形模型和机器学习类(class)中记得的片段来看，我认为我们应该能够从他提交的摘要中重建它，以及他在采访中对此的说法。

采访内容:

Some searches find words that appear in similar contexts. That’s pretty good, but that’s following the relationships to the first degree. My algorithm tries to follow connections further. Connections that are close are deemed more valuable. In theory, it follows connections to an infinite degree.

摘要将其置于上下文中:

A novel information retrieval algorithm called "Apodora" is introduced, using limiting powers of Markov chain-like matrices to determine models for the documents and making contextual statistical inferences about the semantics of words. The system is implemented and compared to the vector space model. Especially when the query is short, the novel algorithm gives results with approximately twice the precision and has interesting applications to microsearch.

我觉得了解马尔可夫链矩阵或信息检索的人会立即意识到他在做什么。

那么:他在做什么？

最佳答案

从使用“上下文”等词以及他引入了二阶统计依赖性的事实来看，我怀疑他正在做一些与论文中概述的 LDA-HMM 方法相关的事情:Griffiths, T., Steyvers 、M.、Blei, D. 和 Tenenbaum, J. (2005)。整合主题和语法。神经信息处理系统的进展。由于模型平均，搜索分辨率存在一些固有的限制。然而，我很羡慕 17 岁时能做这样的事情，我希望他能独立地做一些事情，至少能做得更好。即使同一主题有不同的方向也会很酷。

关于nlp - 重建现在著名的 17 岁男孩基于马尔可夫链的信息检索算法 "Apodora"，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/6967792/

nlp - 重建现在著名的 17 岁男孩基于马尔可夫链的信息检索算法 "Apodora"

上一篇：machine-learning - 自动编码器不学习恒等函数

下一篇：artificial-intelligence - 选择正确的神经网络类型