hash - adler32 哈希的可怕冲突

标签 hash hash-collision adler32

当使用 adler32() 作为散列函数时,应该会出现罕见的冲突。

我们可以精确计算碰撞概率,但粗略地说,
既然是 32 位哈希函数,应该不会有太多冲突
在几千个项目的样本集上。

情况并非如此。

这是一个示例:让我们采用中间包含日期的字符串,例如

"Some prefix text " + date + " some postfix text."

其中日期格式为 yyyy-mm-dd,并在 2012 年循环。

在这个例子中有 91 次碰撞!

更糟糕的是:有 7 种情况是 3 个日期发生冲突。

这么常用的哈希函数怎么性能这么差呢?
或者我错过了什么?

下面是上面例子的详细结果:
0x592a0f1f: 2012-01-30, 2012-02-02, 2012-10-21
0x592b0f1f: 2012-02-11, 2012-10-30, 2012-11-02
0x593d0f20: 2012-01-31, 2012-02-03, 2012-10-22
0x593e0f20: 2012-02-12, 2012-10-31, 2012-11-03
0x59410f20: 2012-03-11, 2012-11-30, 2012-12-02
0x59560f21: 2012-03-30, 2012-04-02, 2012-12-21
0x59690f22: 2012-03-31, 2012-04-03, 2012-12-22

0x59020f1d: 2012-01-10, 2012-10-01
0x59150f1e: 2012-01-11, 2012-10-02
0x59160f1e: 2012-01-20, 2012-10-11
0x59170f1e: 2012-02-01, 2012-10-20
0x59180f1e: 2012-02-10, 2012-11-01
0x59280f1f: 2012-01-12, 2012-10-03
0x59290f1f: 2012-01-21, 2012-10-12
0x592c0f1f: 2012-02-20, 2012-11-11
0x592d0f1f: 2012-03-01, 2012-11-20
0x592e0f1f: 2012-03-10, 2012-12-01
0x593b0f20: 2012-01-13, 2012-10-04
0x593c0f20: 2012-01-22, 2012-10-13
0x593f0f20: 2012-02-21, 2012-11-12
0x59400f20: 2012-03-02, 2012-11-21
0x59420f20: 2012-03-20, 2012-12-11
0x59430f20: 2012-04-01, 2012-12-20
0x594e0f21: 2012-01-14, 2012-10-05
0x594f0f21: 2012-01-23, 2012-10-14
0x59500f21: 2012-02-04, 2012-10-23
0x59510f21: 2012-02-13, 2012-11-04
0x59520f21: 2012-02-22, 2012-11-13
0x59530f21: 2012-03-03, 2012-11-22
0x59540f21: 2012-03-12, 2012-12-03
0x59550f21: 2012-03-21, 2012-12-12
0x59570f21: 2012-04-11, 2012-12-30
0x59610f22: 2012-01-15, 2012-10-06
0x59620f22: 2012-01-24, 2012-10-15
0x59630f22: 2012-02-05, 2012-10-24
0x59640f22: 2012-02-14, 2012-11-05
0x59650f22: 2012-02-23, 2012-11-14
0x59660f22: 2012-03-04, 2012-11-23
0x59670f22: 2012-03-13, 2012-12-04
0x59680f22: 2012-03-22, 2012-12-13
0x596a0f22: 2012-04-12, 2012-12-31
0x596c0f22: 2012-04-30, 2012-05-02
0x59740f23: 2012-01-16, 2012-10-07
0x59750f23: 2012-01-25, 2012-10-16
0x59760f23: 2012-02-06, 2012-10-25
0x59770f23: 2012-02-15, 2012-11-06
0x59780f23: 2012-02-24, 2012-11-15
0x59790f23: 2012-03-05, 2012-11-24
0x597a0f23: 2012-03-14, 2012-12-05
0x597b0f23: 2012-03-23, 2012-12-14
0x597c0f23: 2012-04-04, 2012-12-23
0x59820f23: 2012-05-30, 2012-06-02
0x59870f24: 2012-01-17, 2012-10-08
0x59880f24: 2012-01-26, 2012-10-17
0x59890f24: 2012-02-07, 2012-10-26
0x598a0f24: 2012-02-16, 2012-11-07
0x598b0f24: 2012-02-25, 2012-11-16
0x598c0f24: 2012-03-06, 2012-11-25
0x598d0f24: 2012-03-15, 2012-12-06
0x598e0f24: 2012-03-24, 2012-12-15
0x598f0f24: 2012-04-05, 2012-12-24
0x59950f24: 2012-05-31, 2012-06-03
0x59980f24: 2012-06-30, 2012-07-02
0x599a0f25: 2012-01-18, 2012-10-09
0x599b0f25: 2012-01-27, 2012-10-18
0x599c0f25: 2012-02-08, 2012-10-27
0x599d0f25: 2012-02-17, 2012-11-08
0x599e0f25: 2012-02-26, 2012-11-17
0x599f0f25: 2012-03-07, 2012-11-26
0x59a00f25: 2012-03-16, 2012-12-07
0x59a10f25: 2012-03-25, 2012-12-16
0x59a20f25: 2012-04-06, 2012-12-25
0x59ae0f25: 2012-07-30, 2012-08-02
0x59ae0f26: 2012-01-28, 2012-10-19
0x59af0f26: 2012-02-09, 2012-10-28
0x59b00f26: 2012-02-18, 2012-11-09
0x59b10f26: 2012-02-27, 2012-11-18
0x59b20f26: 2012-03-08, 2012-11-27
0x59b30f26: 2012-03-17, 2012-12-08
0x59b40f26: 2012-03-26, 2012-12-17
0x59b50f26: 2012-04-07, 2012-12-26
0x59c10f26: 2012-07-31, 2012-08-03
0x59c40f26: 2012-08-30, 2012-09-02
0x59c40f27: 2012-02-28, 2012-11-19
0x59c50f27: 2012-03-09, 2012-11-28
0x59c60f27: 2012-03-18, 2012-12-09
0x59c70f27: 2012-03-27, 2012-12-18
0x59c80f27: 2012-04-08, 2012-12-27
0x59d70f27: 2012-08-31, 2012-09-03
0x59da0f28: 2012-03-28, 2012-12-19
0x59db0f28: 2012-04-09, 2012-12-28

最佳答案

Adler-32 从未打算成为也不是哈希函数。它的目的是解压后的错误检测。它很好地服务于这个目的,因为它速度快,而且压缩数据中的错误会被解压缩器放大。

在您给出的示例中,您在非常短的字符串上使用 Adler-32,为此它甚至没有机会使用所有 32 位。 Adler-32 至少需要几百个字节才能滚动。

有许多非加密散列函数非常快并且具有良好的散列行为,包括避免冲突。看看CityHash哈希函数族。

如果您需要加密(不可欺骗)哈希函数,请查看 SHA-2SHA-3 .

关于hash - adler32 哈希的可怕冲突,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13455067/

相关文章:

Java哈希冲突概率

java - 如果采用线性探测,删除是否比单独链接的情况更便宜?

javascript - 如何从 JavaScript 对象生成校验和?

php - 哈希密码功能无法正常工作

ruby - 从 Jekyll 插件返回目录中的文件列表?

Python hash() 无法处理长整数?

c - Else 语句在 strcmp 中返回错误结果(比较哈希值)(已更新)

java - 我应该使用什么作为空的哈希码?

c - 消息的 Adler32 + adler 总和是否为零(如 CRC32)

java - 这是 Java 的 Inflater 中的错误还是什么?