python - 分块读取大文件，分块压缩和写入

由于文件大小较大，我在处理它们时遇到了一个问题，文件的大小正在逐渐增加，并且将来还会继续增加。由于我上传压缩文件的第三方应用程序的限制，我只能使用 deflate 作为压缩选项。

运行脚本的服务器上的内存有限，因此会发生常见的内存问题，这就是为什么我尝试分块读取和分块写入，输出是所需的压缩文件。

到目前为止，我一直在使用此代码片段来压缩文件以减小大小，并且直到现在当文件有两个大文件需要处理/压缩时，它一直工作正常。

with open(file_path_partial, 'rb') as file_upload, open(file_path, 'wb') as file_compressed:
    file_compressed.write(zlib.compress(file_upload.read()))

我尝试过一些不同的选项来解决这个问题，但到目前为止，所有这些选项都未能正常工作。

with open(file_path_partial, 'rb') as file_upload:
    with open(file_path, 'wb') as file_compressed:
        with gzip.GzipFile(file_path_partial, 'wb', fileobj=file_compressed) as file_compressed:
            shutil.copyfileobj(file_upload, file_compressed)

BLOCK_SIZE = 64

compressor = zlib.compressobj(1)

filename = file_path_partial

with open(filename, 'rb') as input:
    with open(file_path, 'wb') as file_compressed:
        while True:            
            block = input.read(BLOCK_SIZE)
            if not block:
                break
            file_compressed.write(compressor.compress(block))

最佳答案

下面的示例读取 64k block ，修改每个 block 并将其写入 gzip 文件。

这是你想要的吗？

import gzip

with open("test.txt", "rb") as fin, gzip.GzipFile("modified.txt.gz", "w") as fout:
    while True:
        block = fin.read(65536) # read in 64k blocks
        if not block:
            break
        # comment next line to just write through
        block = block.replace(b"a", b"A")
        fout.write(block)

关于python - 分块读取大文件，分块压缩和写入，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/61880710/

python - 分块读取大文件，分块压缩和写入

上一篇：r - 如何在R中的数据框中插入行数较少的列

下一篇：react-native - FormData 在 react native 中发送字符串值而不是视频文件