python - 如何使用开始和结束索引对 numpy 行进行切片

index = np.array([[1,2],[2,4],[1,5],[5,6]])
z = np.zeros(shape = [4,10], dtype = np.float32)

设置z[np.arange(4),index[:,0]], z[np.arange(4), index[:, 1]] 它们之间的所有值都为 1?

预期输出:

array([[0, 1, 1, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 1, 1, 0, 0, 0, 0, 0],
       [0, 1, 1, 1, 1, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 1, 1, 0, 0, 0]])

最佳答案

我们可以利用 NumPy broadcasting对于矢量化解决方案，只需将开始和结束索引与覆盖列长度的范围数组进行比较，即可为我们提供一个掩码，该掩码表示输出数组中需要分配为 1s 的所有位置。

所以，解决方案应该是这样的——

ncols = z.shape[1]
r = np.arange(z.shape[1])
mask = (index[:,0,None] <= r) & (index[:,1,None] >= r)
z[mask] = 1

sample 运行-

In [39]: index = np.array([[1,2],[2,4],[1,5],[5,6]])
    ...: z = np.zeros(shape = [4,10], dtype = np.float32)

In [40]: ncols = z.shape[1]
    ...: r = np.arange(z.shape[1])
    ...: mask = (index[:,0,None] <= r) & (index[:,1,None] >= r)
    ...: z[mask] = 1

In [41]: z
Out[41]: 
array([[0., 1., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 1., 1., 0., 0., 0., 0., 0.],
       [0., 1., 1., 1., 1., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 1., 0., 0., 0.]], dtype=float32)

如果z总是一个zeros-initialized数组，我们可以直接从mask获取输出-

z = mask.astype(int)

sample 运行-

In [37]: mask.astype(int)
Out[37]: 
array([[0, 1, 1, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 1, 1, 0, 0, 0, 0, 0],
       [0, 1, 1, 1, 1, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 1, 1, 0, 0, 0]])

基准测试

比较@hpaulj 的foo0 和我的foo4，如@hpaulj 的帖子中所列，用于1000 行和可变列数的集合。我们从 10 列开始，因为这是输入样本的列出方式，我们给它更多的行 - 1000。我们会将列数增加到 1000。

这是时间安排-

In [14]: ncols = 10
    ...: index = np.random.randint(0,ncols,(10000,2))
    ...: z = np.zeros(shape = [len(index),ncols], dtype = np.float32)

In [15]: %timeit foo0(z,index)
    ...: %timeit foo4(z,index)
100 loops, best of 3: 6.27 ms per loop
1000 loops, best of 3: 594 µs per loop

In [16]: ncols = 100
    ...: index = np.random.randint(0,ncols,(10000,2))
    ...: z = np.zeros(shape = [len(index),ncols], dtype = np.float32)

In [17]: %timeit foo0(z,index)
    ...: %timeit foo4(z,index)
100 loops, best of 3: 6.49 ms per loop
100 loops, best of 3: 2.74 ms per loop

In [38]: ncols = 300
    ...: index = np.random.randint(0,ncols,(1000,2))
    ...: z = np.zeros(shape = [len(index),ncols], dtype = np.float32)

In [39]: %timeit foo0(z,index)
    ...: %timeit foo4(z,index)
1000 loops, best of 3: 657 µs per loop
1000 loops, best of 3: 600 µs per loop

In [40]: ncols = 1000
    ...: index = np.random.randint(0,ncols,(1000,2))
    ...: z = np.zeros(shape = [len(index),ncols], dtype = np.float32)

In [41]: %timeit foo0(z,index)
    ...: %timeit foo4(z,index)
1000 loops, best of 3: 673 µs per loop
1000 loops, best of 3: 1.78 ms per loop

因此，选择最好的一个将取决于循环和基于广播的向量化问题集之间的列数。

关于python - 如何使用开始和结束索引对 numpy 行进行切片，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49188161/

python - 如何使用开始和结束索引对 numpy 行进行切片

基准测试

上一篇：python - 在加载旧对象时使用 python 的属性

下一篇：python - 在 df 列上迭代不同的正则表达式模式