python - 在python中使用gzip压缩大文件

我搜索了如何在python中压缩文件，找到了一个基本上如下所述的答案:

with open(input_file, 'rb') as f_in, gzip.open(output_file, 'wb') as f_out:
    f_out.write(f_in.read())

它可以轻松处理 1GB 文件。但我计划将文件压缩到 200 GB。

有什么我需要考虑的因素吗？对于这样的大文件，我应该采取不同的方法吗？

这些文件是二进制.img文件( block 设备的导出；通常末尾有空白空间，因此压缩效果非常好)。

最佳答案

这会将整个文件读入内存，如果您没有 200G 可用空间，则会给您带来问题!

您可以简单地通过 gzip 传输文件，避免使用 Python，因为 Python 会分块处理工作

% gzip -c myfile.img > myfile.img.gz

否则，您应该分块读取文件(选择较大的 block 大小可能会带来一些好处)

BLOCK_SIZE = 8192

with open(myfile, "rb") as f_in, gzip.open(output_file, 'wb') as f_out:
    while True:
        content = f_in.read(BLOCK_SIZE)
        if not content:
            break
        f_out.write(content)

关于python - 在python中使用gzip压缩大文件，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/66250471/

上一篇：cockroachdb - 无法从 Spring Data R2DBC 驱动程序连接到 CockroachCloud Free(测试版)集群

下一篇：重铸任意深度的嵌套列表

相关文章：

module - python 3 : no gzip or zlib?

php - 如何从 PHP 禁用 nginx gzip？

http - 可以 gzip 压缩对 http 请求错误的响应吗？

go - 用 golang 解压 gzip 字符串

python - 将现有模块更新到 Odoo 12 中的最新版本

python - Keras( tensorflow 后端)获取 "TypeError: unhashable type: ' Dimension'"

python - 检查猴子补丁方法的身份

performance - 有哪些内容不应该进行 gzip 压缩吗？

python - C 中的 "Unpacking"

python - 如何处理 gitpython 克隆异常？