python - 如何从 python 中损坏的 gzip 文件中提取数据

我正在使用 python gzip 库来扩展文件，其中一些已损坏。确切的错误是这样的:

解压时出现错误-3:无效的 block 类型

是否可以读取文件断点之前的所有数据，或者以某种方式跳过断点并读取前后的内容？压缩文件基本上是一行行的文本，我想尽可能多地恢复数据。

谢谢

最佳答案

希望有人觉得这有用:

# http://stackoverflow.com/questions/2423866/python-decompressing-gzip-chunk-by-chunk
# http://stackoverflow.com/questions/3122145/zlib-error-error-3-while-decompressing-incorrect-header-check/22310760
def read_corrupted_file(filename, CHUNKSIZE=1024):
    d = zlib.decompressobj(zlib.MAX_WBITS | 32)
    with open(filename, 'rb') as f:
        result_str = ''
        buffer=f.read(CHUNKSIZE)
        try:
            while buffer:
                result_str += d.decompress(buffer)
                buffer=f.read(CHUNKSIZE)
        except Exception as e:
            print 'Error: %s -> %s' % (filename, e.message)
        return result_str

关于python - 如何从 python 中损坏的 gzip 文件中提取数据，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/26794514/

上一篇：python - 打印斐波那契数列

下一篇：python - Python 中的缩进并不总是必要的？

相关文章：

python - 比较分组数据帧的值

python - 使窗口不会相互重叠

python - Keras 神经网络错误 : Setting an Array Element with a Sequence

python - 在 python 中遍历元组的语法错误

compression - 给定一个解压缩的 gzip 文件，有没有办法重新创建确切的原始 gzip 文件？

python - 将 Thrift 客户端连接到同一主机上不同 docker 容器中的 Thrift 服务器

python - 将边缘颜色从 NetworkX 导出到 Gephi

删除单字符串的 MySql

java - 是否有 GZIP J2ME 库？

java - 创建 GZip 响应的 JSP 过滤器