python - md5sum shell脚本和python hashlib.md5是不同的

我正在比较两个不同位置的两个 qcow2 图像文件以查看差异。 /opt/images/file.qcow2 /mnt/images/file.qcow2

当我运行时

md5sum /opt/images/file.qcow2 
md5sum  /mnt/images/file.qcow2

两个文件的校验和相同

但是当尝试使用以下代码找到 md5sum 时

def isImageLatest(file1,file2):
    print('Checking md5sum of {} {}'.format(file1, file2))

    if os.path.isfile(file1) and os.path.isfile(file2):
        md5File1 = hashlib.md5(file1).hexdigest()
        md5File2 = hashlib.md5(file2).hexdigest()
        print('md5sum of {} is {}'.format(file1, md5File1))
        print('md5sum of {} is {}'.format(file2, md5File2))
    else:
        print('Either {} or {} File not found'.format(file1,file2))
        return False

    if md5File1 == md5File2:
        return True
    else:
        return False

它说校验和不一样

更新文件大小可以达到 8 GB

最佳答案

您正在散列文件的路径，而不是内容......

hashlib.md5(file1).hexdigest()  # file1 == '/path/to/file.ext'

收件人hash the content:

def md5(fname):
    hash_md5 = hashlib.md5()
    with open(fname, "rb") as f:
        for chunk in iter(lambda: f.read(16384), b""):
            hash_md5.update(chunk)
    return hash_md5.hexdigest()

def isImageLatest(file1,file2):
    print('Checking md5sum of {} {}'.format(file1, file2))

    if os.path.isfile(file1) and os.path.isfile(file2):
        md5File1 = md5(file1)
        md5File2 = md5(file2)
        print('md5sum of {} is {}'.format(file1, md5File1))
        print('md5sum of {} is {}'.format(file2, md5File2))
    else:
        print('Either {} or {} File not found'.format(file1,file2))
        return False

    if md5File1 == md5File2:
        return True
    else:
        return False

旁注:您可能想使用 hashlib.sha1()(使用 unix 的 sha1sum)而不是 md5 已损坏并已弃用...

编辑: 各种缓冲区大小和 md5 与 sha1 的基准测试在糟糕的服务器 (Atom N2800 @1.86GHz) 上使用 100mB 随机文件:

┏━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Algorithm ┃  Buffer ┃    Time (s)   ┃
┡━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│    md5sum │     --- │ 0.387         │
│       MD5 │     2⁶  │ 21.5670549870 │
│       MD5 │     2⁸  │ 6.64844799042 │
│       MD5 │     2¹⁰ │ 3.12886619568 │
│       MD5 │     2¹² │ 1.82865810394 │
│       MD5 │     2¹⁴ │ 1.27349495888 │
│       MD5 │   128¹  │ 11.5235209465 │
│       MD5 │   128²  │ 1.27280807495 │
│       MD5 │   128³  │ 1.16839885712 │
│   sha1sum │    ---  │ 1.013         │
│      SHA1 │     2⁶  │ 23.4520659447 │
│      SHA1 │     2⁸  │ 7.75686216354 │
│      SHA1 │     2¹⁰ │ 3.82775402069 │
│      SHA1 │     2¹² │ 2.52755594254 │
│      SHA1 │     2¹⁴ │ 1.93437695503 │
│      SHA1 │   128¹  │ 12.9430441856 │
│      SHA1 │   128²  │ 1.93382811546 │
│      SHA1 │   128³  │ 1.81412386894 │
└───────────┴─────────┴───────────────┘

所以 md5sum 比 sha1sum 快，python 的实现显示相同。拥有更大的缓冲区可以提高性能，但在一个限制内(16384 似乎是一个很好的权衡(不是太大和高效))。

关于python - md5sum shell脚本和python hashlib.md5是不同的，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/38181554/

python - md5sum shell脚本和python hashlib.md5是不同的

上一篇：python - 使用 np.random 生成值

下一篇：python - 如何使用 ScrolledText 小部件为文本添加多色？