Python解压相对性能？

TLDR；属于the various compression algorithms available in python gzip、bz2、lzma等，哪个解压性能最好？

完整讨论:

Python 3 有 various modules for compressing/decompressing data 包括 gzip、bz2 和 lzma。 gzip 和 bz2 还具有您可以设置的不同压缩级别。

如果我的目标是平衡文件大小(/压缩比)和解压速度(压缩速度不是问题)，哪个将是最佳选择？解压速度比解压速度更重要文件大小，但由于所讨论的未压缩文件每个大约 600-800MB(32 位 RGB .png 图像文件)，而且我有十几个，所以我确实想要一些压缩。

我的用例是我从磁盘加载一打图像，对它们进行一些处理(作为 numpy 数组)，然后在我的程序中使用处理后的数组数据。
- 图像永远不会改变，我只需要在每次运行我的程序时加载它们。
- 处理时间与加载时间(几秒)大致相同，因此我试图通过保存处理后的数据(使用 pickle)而不是加载来节省一些加载时间每次都是原始的、未处理的图像。最初的测试很有希望——加载原始/未压缩的 pickled 数据花费了不到一秒，而加载和处理原始图像需要 3 或 4 秒——但如上所述导致文件大小约为 600-800MB，而原始 png 图像是只有大约 5MB。因此，我希望通过以压缩格式存储选取的数据，可以在加载时间和文件大小之间取得平衡。
更新:实际情况比我上面描述的要复杂一些。我的应用程序使用 PySide2，因此我可以访问 Qt 库。
- 如果我读取图像并使用 pillow (PIL.Image) 转换为 numpy 数组，我实际上不需要做任何处理，但总将图像读入阵列的时间约为 4 秒。
- 如果我改为使用 QImage 来读取图像，那么我必须对结果进行一些处理以使其可用于我的程序的其余部分，因为 QImage 加载数据 - 基本上我必须交换位顺序，然后旋转每个“像素”，以便 alpha channel (显然是由 QImage 添加的)排在最后而不是第一个。整个过程大约需要 3.8 秒，因此略微比仅使用 PIL 快。
- 如果我保存未压缩的 numpy 数组，那么我可以在 0.8 秒内将它们加载回来，因此是目前最快的，但文件较大。

┌────────────┬────────────────────────┬───────────────┬─────────────┐
│ Python Ver │     Library/Method     │ Read/unpack + │ Compression │
│            │                        │ Decompress (s)│    Ratio    │
├────────────┼────────────────────────┼───────────────┼─────────────┤
│ 3.7.2      │ pillow (PIL.Image)     │ 4.0           │ ~0.006      │
│ 3.7.2      │ Qt (QImage)            │ 3.8           │ ~0.006      │
│ 3.7.2      │ numpy (uncompressed)   │ 0.8           │ 1.0         │
│ 3.7.2      │ gzip (compresslevel=9) │ ?             │ ?           │
│ 3.7.2      │ gzip (compresslevel=?) │ ?             │ ?           │
│ 3.7.2      │ bz2 (compresslevel=9)  │ ?             │ ?           │
│ 3.7.2      │ bz2 (compresslevel=?)  │ ?             │ ?           │
│ 3.7.2      │ lzma                   │ ?             │ ?           │
├────────────┼────────────────────────┼───────────────┼─────────────┤
│ 3.7.3      │ ?                      │ ?             │ ?           │  
├────────────┼────────────────────────┼───────────────┼─────────────┤
│ 3.8beta1   │ ?                      │ ?             │ ?           │
├────────────┼────────────────────────┼───────────────┼─────────────┤
│ 3.8.0final │ ?                      │ ?             │ ?           │
├────────────┼────────────────────────┼───────────────┼─────────────┤
│ 3.5.7      │ ?                      │ ?             │ ?           │
├────────────┼────────────────────────┼───────────────┼─────────────┤
│ 3.6.10     │ ?                      │ ?             │ ?           │
└────────────┴────────────────────────┴───────────────┴─────────────┘

示例 .png 图片: 例如，this 5.0Mb png image, a fairly high resolution image of the coastline of Alaska .

png/PIL 案例的代码(加载到 numpy 数组中):

from PIL import Image
import time
import numpy

start = time.time()
FILE = '/path/to/file/AlaskaCoast.png'
Image.MAX_IMAGE_PIXELS = None
img = Image.open(FILE)
arr = numpy.array(img)
print("Loaded in", time.time()-start)

在我使用 Python 3.7.2 的机器上，这个负载大约需要 4.2 秒。

或者，我可以加载通过选取上面创建的数组生成的未压缩的 pickle 文件。

未压缩 pickle 负载情况的代码:

import pickle
import time

start = time.time()    
with open('/tmp/test_file.pickle','rb') as picklefile:
  arr = pickle.load(picklefile)    
print("Loaded in", time.time()-start)

从这个未压缩的 pickle 文件加载在我的机器上需要大约 0.8 秒。

最佳答案

你可以使用 Python-blosc

是very fast对于小型阵列 (<2GB) 也很容易使用。在像您的示例这样易于压缩的数据上，压缩数据以进行 IO 操作通常会更快。 (SATA-SSD:大约 500 MB/s，PCIe-SSD:高达 3500MB/s)在解压缩步骤中，阵列分配是成本最高的部分。如果您的图像具有相似的形状，则可以避免重复的内存分配。

示例

以下示例假定一个连续数组。

import blosc
import pickle

def compress(arr,Path):
    #c = blosc.compress_ptr(arr.__array_interface__['data'][0], arr.size, arr.dtype.itemsize, clevel=3,cname='lz4',shuffle=blosc.SHUFFLE)
    c = blosc.compress_ptr(arr.__array_interface__['data'][0], arr.size, arr.dtype.itemsize, clevel=3,cname='zstd',shuffle=blosc.SHUFFLE)
    f=open(Path,"wb")
    pickle.dump((arr.shape, arr.dtype),f)
    f.write(c)
    f.close()
    return c,arr.shape, arr.dtype

def decompress(Path):
    f=open(Path,"rb")
    shape,dtype=pickle.load(f)
    c=f.read()
    #array allocation takes most of the time
    arr=np.empty(shape,dtype)
    blosc.decompress_ptr(c, arr.__array_interface__['data'][0])
    return arr

#Pass a preallocated array if you have many similar images
def decompress_pre(Path,arr):
    f=open(Path,"rb")
    shape,dtype=pickle.load(f)
    c=f.read()
    #array allocation takes most of the time
    blosc.decompress_ptr(c, arr.__array_interface__['data'][0])
    return arr

基准

#blosc.SHUFFLE, cname='zstd' -> 4728KB,  
%timeit compress(arr,"Test.dat")
1.03 s ± 12.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#611 MB/s
%timeit decompress("Test.dat")
146 ms ± 481 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
#4310 MB/s
%timeit decompress_pre("Test.dat",arr)
50.9 ms ± 438 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
#12362 MB/s

#blosc.SHUFFLE, cname='lz4' -> 9118KB, 
%timeit compress(arr,"Test.dat")
32.1 ms ± 437 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
#19602 MB/s
%timeit decompress("Test.dat")
146 ms ± 332 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
#4310 MB/s
%timeit decompress_pre("Test.dat",arr)
53.6 ms ± 82.9 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
#11740 MB/s

编辑

此版本更适合一般用途。它确实处理 f-contiguous、c-contiguous 和非连续数组以及 >2GB 的数组。也看看 bloscpack .

import blosc
import pickle

def compress(file, arr,clevel=3,cname='lz4',shuffle=1):
    """
    file           path to file
    arr            numpy nd-array
    clevel         0..9
    cname          blosclz,lz4,lz4hc,snappy,zlib
    shuffle        0-> no shuffle, 1->shuffle,2->bitshuffle
    """
    max_blk_size=100_000_000 #100 MB 

    shape=arr.shape
    #dtype np.object is not implemented
    if arr.dtype==np.object:
        raise(TypeError("dtype np.object is not implemented"))

    #Handling of fortran ordered arrays (avoid copy)
    is_f_contiguous=False
    if arr.flags['F_CONTIGUOUS']==True:
        is_f_contiguous=True
        arr=arr.T.reshape(-1)
    else:
        arr=np.ascontiguousarray(arr.reshape(-1))

    #Writing
    max_num=max_blk_size//arr.dtype.itemsize
    num_chunks=arr.size//max_num

    if arr.size%max_num!=0:
        num_chunks+=1

    f=open(file,"wb")
    pickle.dump((shape,arr.size,arr.dtype,is_f_contiguous,num_chunks,max_num),f)
    size=np.empty(1,np.uint32)
    num_write=max_num
    for i in range(num_chunks):
        if max_num*(i+1)>arr.size:
            num_write=arr.size-max_num*i
        c = blosc.compress_ptr(arr[max_num*i:].__array_interface__['data'][0], num_write, 
                               arr.dtype.itemsize, clevel=clevel,cname=cname,shuffle=shuffle)
        size[0]=len(c)
        size.tofile(f)
        f.write(c)
    f.close()

def decompress(file,prealloc_arr=None):
    f=open(file,"rb")
    shape,arr_size,dtype,is_f_contiguous,num_chunks,max_num=pickle.load(f)

    if prealloc_arr is None:
        if prealloc_arr.flags['F_CONTIGUOUS']==True
            prealloc_arr=prealloc_arr.T
        if prealloc_arr.flags['C_CONTIGUOUS']!=True
            raise(TypeError("Contiguous array is needed"))
        arr=np.empty(arr_size,dtype)
    else:
        arr=np.frombuffer(prealloc_arr.data, dtype=dtype, count=arr_size)

    for i in range(num_chunks):
        size=np.fromfile(f,np.uint32,count=1)
        c=f.read(size[0])
        blosc.decompress_ptr(c, arr[max_num*i:].__array_interface__['data'][0])
    f.close()

    #reshape
    if is_f_contiguous:
        arr=arr.reshape(shape[::-1]).T
    else:
        arr=arr.reshape(shape)
    return arr

关于Python解压相对性能？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56708673/

Python解压相对性能？

你可以使用 Python-blosc

上一篇：python - 在索引上合并 Panda DataFrame，添加额外的列，并且没有重复的索引

下一篇：python - 使用 np.where 遍历多个数组