python - 使用 numpy 切片数组索引 numpy 数组

(编辑:我根据 hpaulj 的答案编写了一个解决方案，请参阅本文底部的代码)

我编写了一个函数，将 n 维数组分割为更小的数组，以便每个分割总共具有 max_chunk_size 个元素。

由于我需要分割许多相同形状的数组，然后对相应的 block 执行操作，因此它实际上并不对数据进行操作，而是创建一个“索引器”数组，即。 e. (slice(x1, x2), slice(y1, y2), ...) 对象数组(请参阅下面的代码)。借助这些索引器，我可以通过调用 the_array[indexer[i]] 来检索分割(请参阅下面的示例)。

此外，这些索引器的数组具有与输入相同的维数，并且划分沿着相应的轴对齐，即。 e. block the_array[indexer[i,j,k]] 和 the_array[indexer[i+1,j,k]] 沿 0 轴调整等。

我期望我也应该能够通过调用the_array[indexer[i:i+2,j,k]]和the_array[indexer]<来连接这些 block 只会返回 the_array，但是这样的调用会导致错误:

IndexError: arrays used as indices must be of integer (or boolean) type

是否有解决此错误的简单方法？

代码如下:

import numpy as np
import itertools

def subdivide(shape, max_chunk_size=500000):
    shape = np.array(shape).astype(float)
    total_size = shape.prod()

    # calculate maximum slice shape:
    slice_shape = np.floor(shape * min(max_chunk_size / total_size, 1.0)**(1./len(shape))).astype(int)

    # create a list of slices for each dimension:
    slices = [[slice(left, min(right, n)) \
      for left, right in zip(range(0, n, step_size), range(step_size, n + step_size, step_size))] \
        for n, step_size in zip(shape.astype(int), slice_shape)]

    result = np.empty(reduce(lambda a,b:a*len(b), slices, 1), dtype=np.object)
    for i, el in enumerate(itertools.product(*slices)): result[i] = el
    result.shape = np.ceil(shape / slice_shape).astype(int)
    return result

这是一个示例用法:

>>> ar = np.arange(90).reshape(6,15)
>>> ar
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44],
       [45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
       [75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])

>>> slices = subdivide(ar.shape, 16)
>>> slices
array([[(slice(0, 2, None), slice(0, 6, None)),
        (slice(0, 2, None), slice(6, 12, None)),
        (slice(0, 2, None), slice(12, 15, None))],
       [(slice(2, 4, None), slice(0, 6, None)),
        (slice(2, 4, None), slice(6, 12, None)),
        (slice(2, 4, None), slice(12, 15, None))],
       [(slice(4, 6, None), slice(0, 6, None)),
        (slice(4, 6, None), slice(6, 12, None)),
        (slice(4, 6, None), slice(12, 15, None))]], dtype=object)

>>> ar[slices[1,0]]
array([[30, 31, 32, 33, 34, 35],
       [45, 46, 47, 48, 49, 50]])
>>> ar[slices[0,2]]
array([[12, 13, 14],
       [27, 28, 29]])
>>> ar[slices[2,1]]
array([[66, 67, 68, 69, 70, 71],
       [81, 82, 83, 84, 85, 86]])

>>> ar[slices[:2,1:3]]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: arrays used as indices must be of integer (or boolean) type

这是基于 hpaulj 答案的解决方案:

import numpy as np
import itertools

class Subdivision():
    def __init__(self, shape, max_chunk_size=500000):
        shape = np.array(shape).astype(float)
        total_size = shape.prod()

        # calculate maximum slice shape:
        slice_shape = np.floor(shape * min(max_chunk_size / total_size, 1.0)**(1./len(shape))).astype(int)

        # create a list of slices for each dimension:
        slices = [[slice(left, min(right, n)) \
          for left, right in zip(range(0, n, step_size), range(step_size, n + step_size, step_size))] \
            for n, step_size in zip(shape.astype(int), slice_shape)]

        self.slices = \
            np.array(list(itertools.product(*slices)), \
                     dtype=np.object).reshape(tuple(np.ceil(shape / slice_shape).astype(int)) + (len(shape),))

    def __getitem__(self, args):
        if type(args) != tuple: args = (args,)

        # turn integer index into equivalent slice
        args = tuple(slice(arg, arg + 1 if arg != -1 else None) if type(arg) == int else arg for arg in args)

        # select the slices
        # always select all elements from the last axis (which contains slices for each data dimension)
        slices = self.slices[args + ((slice(None),) if Ellipsis in args else (Ellipsis, slice(None)))]

        return np.ix_(*tuple(np.r_[tuple(slices[tuple([0] * i + [slice(None)] + \
                                                      [0] * (len(slices.shape) - 2 - i) + [i])])] \
                                for i in range(len(slices.shape) - 1)))

使用示例:

>>> ar = np.arange(90).reshape(6,15)
>>> ar
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44],
       [45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
       [75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])

>>> subdiv = Subdivision(ar.shape, 16)
>>> ar[subdiv[...]]
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44],
       [45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
       [75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])

>>> ar[subdiv[0]]
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])

>>> ar[subdiv[:2,1]]
array([[ 6,  7,  8,  9, 10, 11],
       [21, 22, 23, 24, 25, 26],
       [36, 37, 38, 39, 40, 41],
       [51, 52, 53, 54, 55, 56]])

>>> ar[subdiv[2,:3]]
array([[60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74],
       [75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]])

>>> ar[subdiv[...,:2]]
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
       [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41],
       [45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71],
       [75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86]])

最佳答案

您的切片会生成 2x6 和 2x3 数组。

In [36]: subslice=slices[:2,1:3]
In [37]: subslice[0,0]
Out[37]: array([slice(0, 2, None), slice(6, 12, None)], dtype=object)

In [38]: ar[tuple(subslice[0,0])]
Out[38]: 
array([[ 6,  7,  8,  9, 10, 11],
       [21, 22, 23, 24, 25, 26]])

我的 numpy 版本希望我将 subslice 转换为元组。这与

相同

ar[slice(0,2), slice(6,12)]
ar[:2, 6:12]

这只是索引和切片的基本语法。 ar 是 2d，因此 ar[(i,j)] 需要一个 2 元素元组 - 切片、列表、数组或整数。它不适用于切片对象数组。

如何将结果连接到更大的数组中。这可以在索引之后完成，或者可以将切片转换为索引列表。

np.bmat 例如，将二维数组连接在一起:

In [42]: np.bmat([[ar[tuple(subslice[0,0])], ar[tuple(subslice[0,1])]], 
                  [ar[tuple(subslice[1,0])],ar[tuple(subslice[1,1])]]])
Out[42]: 
matrix([[ 6,  7,  8,  9, 10, 11, 12, 13, 14],
        [21, 22, 23, 24, 25, 26, 27, 28, 29],
        [36, 37, 38, 39, 40, 41, 42, 43, 44],
        [51, 52, 53, 54, 55, 56, 57, 58, 59]])

你可以概括这一点。它只是在嵌套列表上使用 hstack 和 vstack 。结果是 np.matrix，但可以转换回 array。

另一种方法是使用 np.arange、np.r_、np.xi_ 等工具来创建索引数组。生成示例需要一些时间。

要组合 [0,0] 和 [0,1] 子切片:

In [64]: j = np.r_[subslice[0,0,1],subslice[0,1,1]]
In [65]: i = np.r_[subslice[0,0,0]]

In [66]: i,j
Out[66]: (array([0, 1]), array([ 6,  7,  8,  9, 10, 11, 12, 13, 14]))
In [68]: ix = np.ix_(i,j)
In [69]: ix
Out[69]: 
(array([[0],
        [1]]), array([[ 6,  7,  8,  9, 10, 11, 12, 13, 14]]))

In [70]: ar[ix]
Out[70]: 
array([[ 6,  7,  8,  9, 10, 11, 12, 13, 14],
       [21, 22, 23, 24, 25, 26, 27, 28, 29]])

或者使用 i = np.r_[subslice[0,0,0], subslice[1,0,0]], ar[np.ix_(i,j) ] 生成 4x9 数组。

关于python - 使用 numpy 切片数组索引 numpy 数组，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42078259/

python - 使用 numpy 切片数组索引 numpy 数组

上一篇：python - 使用正则表达式将文本文件拆分为多个新文件

下一篇：Python - 将 JSON 对象附加到现有的 JSON 对象