假设我有以下 numpy 数组
:
[[1,1,1]
[1,1,1]
[1,1,1]]
而且我需要在数组中的每个元素的两边都填充一个零(而不是填充行和列的 numpy.pad()
)。结果如下:
[ [0,1,0,0,1,0,0,1,0]
[0,1,0,0,1,0,0,1,0]
[0,1,0,0,1,0,0,1,0] ]
有没有比创建一个空数组并使用嵌套循环更有效的方法来做到这一点?
注意:我的偏好是尽可能快和轻便。单个数组最多可以有 12000^2 个元素,我同时处理其中的 16 个,所以我的边距在 32 位中非常薄
编辑: 应该指定但填充并不总是 1,填充必须是可变的,因为我正在根据具有最高分辨率的数组对数据进行上采样。给定 3 个形状为 (121,121) 的数组; (1200,1200) ; (12010,12010) 我需要能够将前两个数组填充为 (12010,12010) 的形状(我知道这些数字不共享公因数,这不是索引内的问题或两个实际位置是可以接受的,这个填充只是为了让它们成为相同的形状,通过在末端填充行来舍入数字是可以接受的)
工作解决方案:调整@Kasramvd 解决方案即可解决问题。这是适合我的问题应用的代码。
import numpy as np
a = np.array([[1, 2, 3],[1, 2, 3], [1, 2, 3]])
print(a)
x, y = a.shape
factor = 3
indices = np.repeat(np.arange(y + 1), 1*factor*2)[1*factor:-1*factor]
a=np.insert(a, indices, 0, axis=1)
print(a)
结果:
[[1 2 3]
[1 2 3]
[1 2 3]]
[[0 0 0 1 0 0 0 0 0 0 2 0 0 0 0 0 0 3 0 0 0]
[0 0 0 1 0 0 0 0 0 0 2 0 0 0 0 0 0 3 0 0 0]
[0 0 0 1 0 0 0 0 0 0 2 0 0 0 0 0 0 3 0 0 0]]
最佳答案
这是一种使用zeros-initialization
的方法 -
def padcols(arr,padlen):
N = 1+2*padlen
m,n = arr.shape
out = np.zeros((m,N*n),dtype=arr.dtype)
out[:,padlen+np.arange(n)*N] = arr
return out
sample 运行-
In [118]: arr
Out[118]:
array([[21, 14, 23],
[52, 70, 90],
[40, 57, 11],
[71, 33, 78]])
In [119]: padcols(arr,1)
Out[119]:
array([[ 0, 21, 0, 0, 14, 0, 0, 23, 0],
[ 0, 52, 0, 0, 70, 0, 0, 90, 0],
[ 0, 40, 0, 0, 57, 0, 0, 11, 0],
[ 0, 71, 0, 0, 33, 0, 0, 78, 0]])
In [120]: padcols(arr,2)
Out[120]:
array([[ 0, 0, 21, 0, 0, 0, 0, 14, 0, 0, 0, 0, 23, 0, 0],
[ 0, 0, 52, 0, 0, 0, 0, 70, 0, 0, 0, 0, 90, 0, 0],
[ 0, 0, 40, 0, 0, 0, 0, 57, 0, 0, 0, 0, 11, 0, 0],
[ 0, 0, 71, 0, 0, 0, 0, 33, 0, 0, 0, 0, 78, 0, 0]])
基准测试
在本节中,我将使用这篇文章中发布的方法对运行时和内存使用情况进行基准测试:padcols
和 @Kasramvd's solution func : padder
在适合各种填充长度的适当大小的数组上。
时序分析
In [151]: arr = np.random.randint(10,99,(300,300))
# Representative of original `3x3` sized array just bigger
In [152]: %timeit padder(arr,1)
100 loops, best of 3: 3.56 ms per loop
In [153]: %timeit padcols(arr,1)
100 loops, best of 3: 2.13 ms per loop
In [154]: %timeit padder(arr,2)
100 loops, best of 3: 5.82 ms per loop
In [155]: %timeit padcols(arr,2)
100 loops, best of 3: 3.66 ms per loop
In [156]: %timeit padder(arr,3)
100 loops, best of 3: 7.83 ms per loop
In [157]: %timeit padcols(arr,3)
100 loops, best of 3: 5.11 ms per loop
内存分析
用于这些内存测试的脚本 -
import numpy as np
from memory_profiler import profile
arr = np.random.randint(10,99,(300,300))
padlen = 1 # Edited to 1,2,3 for the three cases
n = padlen
@profile(precision=10)
def padder():
x, y = arr.shape
indices = np.repeat(np.arange(y+1), n*2)[n:-n]
return np.insert(arr, indices, 0, axis=1)
@profile(precision=10)
def padcols():
N = 1+2*padlen
m,n = arr.shape
out = np.zeros((m,N*n),dtype=arr.dtype)
out[:,padlen+np.arange(n)*N] = arr
return out
if __name__ == '__main__':
padder()
if __name__ == '__main__':
padcols()
内存使用输出-
案例#1:
$ python -m memory_profiler timing_pads.py
Filename: timing_pads.py
Line # Mem usage Increment Line Contents
================================================
8 42.4492187500 MiB 0.0000000000 MiB @profile(precision=10)
9 def padder():
10 42.4492187500 MiB 0.0000000000 MiB x, y = arr.shape
11 42.4492187500 MiB 0.0000000000 MiB indices = np.repeat(np.arange(y+1), n*2)[n:-n]
12 44.7304687500 MiB 2.2812500000 MiB return np.insert(arr, indices, 0, axis=1)
Filename: timing_pads.py
Line # Mem usage Increment Line Contents
================================================
14 42.8750000000 MiB 0.0000000000 MiB @profile(precision=10)
15 def padcols():
16 42.8750000000 MiB 0.0000000000 MiB N = 1+2*padlen
17 42.8750000000 MiB 0.0000000000 MiB m,n = arr.shape
18 42.8750000000 MiB 0.0000000000 MiB out = np.zeros((m,N*n),dtype=arr.dtype)
19 44.6757812500 MiB 1.8007812500 MiB out[:,padlen+np.arange(n)*N] = arr
20 44.6757812500 MiB 0.0000000000 MiB return out
案例#2:
$ python -m memory_profiler timing_pads.py
Filename: timing_pads.py
Line # Mem usage Increment Line Contents
================================================
8 42.3710937500 MiB 0.0000000000 MiB @profile(precision=10)
9 def padder():
10 42.3710937500 MiB 0.0000000000 MiB x, y = arr.shape
11 42.3710937500 MiB 0.0000000000 MiB indices = np.repeat(np.arange(y+1), n*2)[n:-n]
12 46.2421875000 MiB 3.8710937500 MiB return np.insert(arr, indices, 0, axis=1)
Filename: timing_pads.py
Line # Mem usage Increment Line Contents
================================================
14 42.8476562500 MiB 0.0000000000 MiB @profile(precision=10)
15 def padcols():
16 42.8476562500 MiB 0.0000000000 MiB N = 1+2*padlen
17 42.8476562500 MiB 0.0000000000 MiB m,n = arr.shape
18 42.8476562500 MiB 0.0000000000 MiB out = np.zeros((m,N*n),dtype=arr.dtype)
19 46.1289062500 MiB 3.2812500000 MiB out[:,padlen+np.arange(n)*N] = arr
20 46.1289062500 MiB 0.0000000000 MiB return out
案例#3:
$ python -m memory_profiler timing_pads.py
Filename: timing_pads.py
Line # Mem usage Increment Line Contents
================================================
8 42.3906250000 MiB 0.0000000000 MiB @profile(precision=10)
9 def padder():
10 42.3906250000 MiB 0.0000000000 MiB x, y = arr.shape
11 42.3906250000 MiB 0.0000000000 MiB indices = np.repeat(np.arange(y+1), n*2)[n:-n]
12 47.4765625000 MiB 5.0859375000 MiB return np.insert(arr, indices, 0, axis=1)
Filename: timing_pads.py
Line # Mem usage Increment Line Contents
================================================
14 42.8945312500 MiB 0.0000000000 MiB @profile(precision=10)
15 def padcols():
16 42.8945312500 MiB 0.0000000000 MiB N = 1+2*padlen
17 42.8945312500 MiB 0.0000000000 MiB m,n = arr.shape
18 42.8945312500 MiB 0.0000000000 MiB out = np.zeros((m,N*n),dtype=arr.dtype)
19 47.4648437500 MiB 4.5703125000 MiB out[:,padlen+np.arange(n)*N] = arr
20 47.4648437500 MiB 0.0000000000 MiB return out
关于python - 填充 numpy 数组的元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39018476/