algorithm - 反向多重索引

标签 algorithm search nearest-neighbor inverted-index quantization

我试图从这个 paper 中理解反向多索引,它还有一个较小的版本 here .为此,我构建了一个玩具示例,并希望有人验证或/并与我分享他/她的意见。

例子:

Assume we have N = 6 points in M = 4 dimensions. We use two blocks to
create sub-vecrtors. Let the points be these:

p0 = (0, 0, 1, 1)
p1 = (2, 2, 3, 3)
p2 = (4, 4, 5, 5)
p3 = (6, 6, 7, 7)
p4 = (8, 8, 9, 9)
p5 = (10, 10, 11, 11) // p5^1 = (10, 10), which is appended in D^1 etc.

_________________________________________________________________________

We run k-means twice, once for D^1 and once for D^2, by requiring 3
centroids and we get:
U = { u1 = (1, 1), u2 = (5, 5), u3 = (9, 9) }
V = { v1 = (2, 2), v2 = (6, 6), v3 = (10, 10) }

Now we have to assign the points to the Wi,j. All we be empty, except:

W1,1 = (u1, v1): p0, p1
W2,2 = (u2, v2): p2, p3
W3,3 = (u3, v3): p4, p5

Note: We store the PQ-copressed version with every point ( for example
instead of storing just p0, store [p0, (u1, v1)] ), which will be used
during reranking.
_________________________________________________________________________

Assume T = 4 (which sets L = 2 for the sake of the example) and the query
is: (5, 5, 0, 0). Posing the query...


q^1 VS U
i   u_α(i) r
1   u2     0
2   u1    32 <-- |d1( (5,5), (1,1) )|^2 = (5 - 1)^2 + (5 - 1)^2 = 32

q^2 VS V
j   v_β(j) s
1   v1     8
2   v2    72

Invoke the Multi Sequence Algorithm which will output:
W2,1 --> [u2 v1] --> empty
W1,1 --> [u1 v1] --> p0, p1
W2,2 --> [u2 v2] --> p4, p5
// stop when the points that lie in the Ws returned so far are >= T

So the candidate list is {p0, p1, p2, p3}
_________________________________________________________________________

Rerank the candidate list, by computing the distance between the query and
the PQ-compressed representation (sum the distances of the subvectors).

请问这样正确吗?

最佳答案

是的,这似乎是正确的,正如 Babenko 本人在电子邮件对话中提到的那样。

关于algorithm - 反向多重索引,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38464183/

相关文章:

machine-learning - 当 k=4 时 KNN 选择类标签

python - 如何哈希列表?

查找字符串不同的第一个索引的算法?

algorithm - XOR在算法中有哪些实际应用

algorithm - 点之间的最简单路径

c++ - 反向链表的递归函数(代码片段解释)

python - 如何在文本文件中找到最相关的字符串?

search - 搜索选项错误中的提取文档数据

python - 获取有序 CSV 文件中高于特定 unix 时间戳的行号的有效方法

algorithm - 不同数据结构上最近邻查询的运行时间比较