python - 填充 numpy 数组的元素

假设我有以下 numpy 数组:

[[1,1,1]
 [1,1,1]
 [1,1,1]]

而且我需要在数组中的每个元素的两边都填充一个零(而不是填充行和列的 numpy.pad())。结果如下:

[ [0,1,0,0,1,0,0,1,0]
  [0,1,0,0,1,0,0,1,0]
  [0,1,0,0,1,0,0,1,0] ]

有没有比创建一个空数组并使用嵌套循环更有效的方法来做到这一点？

注意:我的偏好是尽可能快和轻便。单个数组最多可以有 12000^2 个元素，我同时处理其中的 16 个，所以我的边距在 32 位中非常薄

编辑: 应该指定但填充并不总是 1，填充必须是可变的，因为我正在根据具有最高分辨率的数组对数据进行上采样。给定 3 个形状为 (121,121) 的数组； (1200,1200) ; (12010,12010) 我需要能够将前两个数组填充为 (12010,12010) 的形状(我知道这些数字不共享公因数，这不是索引内的问题或两个实际位置是可以接受的，这个填充只是为了让它们成为相同的形状，通过在末端填充行来舍入数字是可以接受的)

工作解决方案:调整@Kasramvd 解决方案即可解决问题。这是适合我的问题应用的代码。

import numpy as np

a = np.array([[1, 2, 3],[1, 2, 3], [1, 2, 3]])

print(a)

x, y = a.shape
factor = 3
indices = np.repeat(np.arange(y + 1), 1*factor*2)[1*factor:-1*factor]

a=np.insert(a, indices, 0, axis=1)

print(a)

结果:

 [[1 2 3]
  [1 2 3]
  [1 2 3]]

 [[0 0 0 1 0 0 0 0 0 0 2 0 0 0 0 0 0 3 0 0 0]
  [0 0 0 1 0 0 0 0 0 0 2 0 0 0 0 0 0 3 0 0 0]
  [0 0 0 1 0 0 0 0 0 0 2 0 0 0 0 0 0 3 0 0 0]]

最佳答案

这是一种使用zeros-initialization 的方法 -

def padcols(arr,padlen):
    N = 1+2*padlen
    m,n = arr.shape
    out = np.zeros((m,N*n),dtype=arr.dtype)
    out[:,padlen+np.arange(n)*N] = arr
    return out

sample 运行-

In [118]: arr
Out[118]: 
array([[21, 14, 23],
       [52, 70, 90],
       [40, 57, 11],
       [71, 33, 78]])

In [119]: padcols(arr,1)
Out[119]: 
array([[ 0, 21,  0,  0, 14,  0,  0, 23,  0],
       [ 0, 52,  0,  0, 70,  0,  0, 90,  0],
       [ 0, 40,  0,  0, 57,  0,  0, 11,  0],
       [ 0, 71,  0,  0, 33,  0,  0, 78,  0]])

In [120]: padcols(arr,2)
Out[120]: 
array([[ 0,  0, 21,  0,  0,  0,  0, 14,  0,  0,  0,  0, 23,  0,  0],
       [ 0,  0, 52,  0,  0,  0,  0, 70,  0,  0,  0,  0, 90,  0,  0],
       [ 0,  0, 40,  0,  0,  0,  0, 57,  0,  0,  0,  0, 11,  0,  0],
       [ 0,  0, 71,  0,  0,  0,  0, 33,  0,  0,  0,  0, 78,  0,  0]])

基准测试

在本节中，我将使用这篇文章中发布的方法对运行时和内存使用情况进行基准测试:padcols 和 @Kasramvd's solution func : padder在适合各种填充长度的适当大小的数组上。

时序分析

In [151]: arr = np.random.randint(10,99,(300,300))
           # Representative of original `3x3` sized array just bigger

In [152]: %timeit padder(arr,1)
100 loops, best of 3: 3.56 ms per loop

In [153]: %timeit padcols(arr,1)
100 loops, best of 3: 2.13 ms per loop

In [154]: %timeit padder(arr,2)
100 loops, best of 3: 5.82 ms per loop

In [155]: %timeit padcols(arr,2)
100 loops, best of 3: 3.66 ms per loop

In [156]: %timeit padder(arr,3)
100 loops, best of 3: 7.83 ms per loop

In [157]: %timeit padcols(arr,3)
100 loops, best of 3: 5.11 ms per loop

内存分析

用于这些内存测试的脚本 -

import numpy as np
from memory_profiler import profile

arr = np.random.randint(10,99,(300,300))
padlen = 1 # Edited to 1,2,3 for the three cases
n = padlen

@profile(precision=10)
def padder():    
    x, y = arr.shape
    indices = np.repeat(np.arange(y+1), n*2)[n:-n]
    return np.insert(arr, indices, 0, axis=1)
    
@profile(precision=10)
def padcols():    
    N = 1+2*padlen
    m,n = arr.shape
    out = np.zeros((m,N*n),dtype=arr.dtype)
    out[:,padlen+np.arange(n)*N] = arr
    return out

if __name__ == '__main__':
    padder()

if __name__ == '__main__':
    padcols()

内存使用输出-

案例#1:

$ python -m memory_profiler timing_pads.py
Filename: timing_pads.py

Line #    Mem usage    Increment   Line Contents
================================================
     8  42.4492187500 MiB   0.0000000000 MiB   @profile(precision=10)
     9                             def padder():    
    10  42.4492187500 MiB   0.0000000000 MiB       x, y = arr.shape
    11  42.4492187500 MiB   0.0000000000 MiB       indices = np.repeat(np.arange(y+1), n*2)[n:-n]
    12  44.7304687500 MiB   2.2812500000 MiB       return np.insert(arr, indices, 0, axis=1)


Filename: timing_pads.py

Line #    Mem usage    Increment   Line Contents
================================================
    14  42.8750000000 MiB   0.0000000000 MiB   @profile(precision=10)
    15                             def padcols():    
    16  42.8750000000 MiB   0.0000000000 MiB       N = 1+2*padlen
    17  42.8750000000 MiB   0.0000000000 MiB       m,n = arr.shape
    18  42.8750000000 MiB   0.0000000000 MiB       out = np.zeros((m,N*n),dtype=arr.dtype)
    19  44.6757812500 MiB   1.8007812500 MiB       out[:,padlen+np.arange(n)*N] = arr
    20  44.6757812500 MiB   0.0000000000 MiB       return out

案例#2:

$ python -m memory_profiler timing_pads.py
Filename: timing_pads.py

Line #    Mem usage    Increment   Line Contents
================================================
     8  42.3710937500 MiB   0.0000000000 MiB   @profile(precision=10)
     9                             def padder():    
    10  42.3710937500 MiB   0.0000000000 MiB       x, y = arr.shape
    11  42.3710937500 MiB   0.0000000000 MiB       indices = np.repeat(np.arange(y+1), n*2)[n:-n]
    12  46.2421875000 MiB   3.8710937500 MiB       return np.insert(arr, indices, 0, axis=1)


Filename: timing_pads.py

Line #    Mem usage    Increment   Line Contents
================================================
    14  42.8476562500 MiB   0.0000000000 MiB   @profile(precision=10)
    15                             def padcols():    
    16  42.8476562500 MiB   0.0000000000 MiB       N = 1+2*padlen
    17  42.8476562500 MiB   0.0000000000 MiB       m,n = arr.shape
    18  42.8476562500 MiB   0.0000000000 MiB       out = np.zeros((m,N*n),dtype=arr.dtype)
    19  46.1289062500 MiB   3.2812500000 MiB       out[:,padlen+np.arange(n)*N] = arr
    20  46.1289062500 MiB   0.0000000000 MiB       return out

案例#3:

$ python -m memory_profiler timing_pads.py
Filename: timing_pads.py

Line #    Mem usage    Increment   Line Contents
================================================
     8  42.3906250000 MiB   0.0000000000 MiB   @profile(precision=10)
     9                             def padder():    
    10  42.3906250000 MiB   0.0000000000 MiB       x, y = arr.shape
    11  42.3906250000 MiB   0.0000000000 MiB       indices = np.repeat(np.arange(y+1), n*2)[n:-n]
    12  47.4765625000 MiB   5.0859375000 MiB       return np.insert(arr, indices, 0, axis=1)


Filename: timing_pads.py

Line #    Mem usage    Increment   Line Contents
================================================
    14  42.8945312500 MiB   0.0000000000 MiB   @profile(precision=10)
    15                             def padcols():    
    16  42.8945312500 MiB   0.0000000000 MiB       N = 1+2*padlen
    17  42.8945312500 MiB   0.0000000000 MiB       m,n = arr.shape
    18  42.8945312500 MiB   0.0000000000 MiB       out = np.zeros((m,N*n),dtype=arr.dtype)
    19  47.4648437500 MiB   4.5703125000 MiB       out[:,padlen+np.arange(n)*N] = arr
    20  47.4648437500 MiB   0.0000000000 MiB       return out

关于python - 填充 numpy 数组的元素，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39018476/

python - 填充 numpy 数组的元素

基准测试

上一篇：python - Django 模板，向模板标签发送两个参数？

下一篇：python - 如何用唯一 ID 替换 Python Pandas 表文本值？