algorithm - 为什么在 Adler-32 校验和算法中对 65521 取模？

Adler-32 校验和算法对 65521 求模求和。我知道 65521 是适合 16 位的最大质数，但为什么在此算法中使用质数很重要？

(我敢肯定，一旦有人告诉我答案就会显而易见，但我大脑中的数论部分无法正常工作。即使没有校验和算法方面的专业知识，阅读 http://en.wikipedia.org/wiki/Fletcher%27s_checksum 的聪明人也可能会给我解释一下。)

最佳答案

为什么 Adler32 使用 mod prime？

来自 Adler 自己的网站 http://zlib.net/zlib_tech.html

However, Adler-32 has been constructed to minimize the ways to make small changes in the data that result in the same check value, through the use of sums significantly larger than the bytes and by using a prime (65521) for the modulus. It is in this area that some analysis is deserved, but it has not yet been done.

The main reason for Adler-32 is, of course, speed in software implementations.

An alternative to Adler-32 is Fletcher-32, which replaces the modulo of 65521 with 65535. This paper shows that Fletcher-32 is superior for channels with low-rate random bit errors.

使用它是因为质数往往具有更好的混合特性。它到底有多好还有待讨论。

其他说明

这个线程中的其他人提出了一个有点令人信服的论点，即模数素数更适合检测位交换。然而，这很可能不是这种情况，因为位交换极为罕见。两个最普遍的错误是:

随处可见的随机位翻转 (1 <-> 0)。
位移位(1 2 3 4 5 -> 2 3 4 5 或 1 1 2 3 4 5)在网络中很常见

大多数位交换是由随机位翻转引起的，这些位翻转恰好看起来像位交换。

纠错码实际上是为了承受 n 位偏差而设计的。来自阿德勒的网站:

A properly constructed CRC-n has the nice property that less than n bits in error is always detectable. This is not always true for Adler-32--it can detect all one- or two-byte errors but can miss some three-byte errors.

使用质数模数的效果

我就本质上相同的问题写了一篇很长的文章。为什么要对质数取模？

http://www.codexon.com/posts/hash-functions-the-modulo-prime-myth

简短的回答

与合数相比，我们对素数的了解要少得多。因此像 Knuth 这样的人开始使用它们。

虽然素数与我们散列的大部分数据关系不大，但增加表/模数大小也会降低冲突的可能性(有时比四舍五入到最接近的素数所获得的任何好处都多)。

这是一个graph比较 mod 65521 与 65535 的每个桶与 1000 万个加密随机整数的碰撞次数。

关于algorithm - 为什么在 Adler-32 校验和算法中对 65521 取模？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/927277/

algorithm - 为什么在 Adler-32 校验和算法中对 65521 取模？

上一篇：algorithm - 倒置的预期次数--来自Cormen的算法导论

下一篇：arrays - 当 A 和 B 排序时找到最小的 A[i]^2 + B[i]^2