我有一个大数组

data = np.empty((n, k))

n 和 k 都很大。我还有很多生成器 g，每个都有 k 元素，我想将每个生成器加载到 data 中的一行中。我能做到:

data[i] = list(g)

或类似的东西，但这会复制 g 中的数据。我可以使用 for 循环加载:

for j, x in enumerate(g):
    data[i, j] = x

但我想知道 numpy 是否已经有一种方法可以做到这一点，而无需在 Python 中进行复制或循环。

我知道 g 有长度 k 并且很乐意在必要时做一些 __len__ 子类修补。 np.fromiter 在创建新数组时会接受类似的东西，但由于上下文的限制，我宁愿尽可能加载到这个已经存在的数组中。

最佳答案

如评论中所述，您无能为力。

虽然可以考虑这两种方案:

使用 `numpy.fromiter`

与其自己创建 data = np.empty((n, k))，不如使用 numpy.fromiter 和 count 参数，这是专门根据这种情况制作的，您可以提前知道项目的数量。这样 numpy 就不必“猜测”大小并重新分配，直到猜测足够大为止。使用 fromiter 允许在 C 而不是 python 中运行 for 循环。这可能会快一点，但真正的瓶颈可能无论如何都在您的生成器中。

请注意 fromiter 仅处理平面数组，因此您需要读取所有平面数组(例如使用 chain.from_iterable)然后才调用 reshape:

from itertools import chain

n = 20
k = 4
generators = (
   (i*j for j in range(k))
   for i in range(n)
)

flat_gen = chain.from_iterable(generators)
data = numpy.fromiter(flat_gen, 'int64', count=n*k)
data = data.reshape((n, k))
"""
array([[ 0,  0,  0,  0],
       [ 0,  1,  2,  3],
       [ 0,  2,  4,  6],
       [ 0,  3,  6,  9],
       [ 0,  4,  8, 12],
       [ 0,  5, 10, 15],
       [ 0,  6, 12, 18],
       [ 0,  7, 14, 21],
       [ 0,  8, 16, 24],
       [ 0,  9, 18, 27],
       [ 0, 10, 20, 30],
       [ 0, 11, 22, 33],
       [ 0, 12, 24, 36],
       [ 0, 13, 26, 39],
       [ 0, 14, 28, 42],
       [ 0, 15, 30, 45],
       [ 0, 16, 32, 48],
       [ 0, 17, 34, 51],
       [ 0, 18, 36, 54],
       [ 0, 19, 38, 57]])
"""

使用赛通

如果你可以重用data并且想避免重新分配内存，你就不能再使用numpy的fromiter了。恕我直言，避免 python 的 for 循环的唯一方法是在 cython 中实现它。同样，这极有可能矫枉过正，因为您仍然需要阅读 python 中的生成器。

作为引用，fromiter 的 C 实现如下所示:https://github.com/numpy/numpy/blob/v1.18.3/numpy/core/src/multiarray/ctors.c#L4001-L4118

关于python - 将生成器中的数据加载到已分配的 numpy 数组中，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/55924831/

python - 将生成器中的数据加载到已分配的 numpy 数组中

使用 `numpy.fromiter`

使用赛通

上一篇：python - 从图像创建弦乐艺术

下一篇：python - 使用多相机的 3D 点投影

python - 将生成器中的数据加载到已分配的 numpy 数组中

使用 numpy.fromiter

使用赛通

上一篇：python - 从图像创建弦乐艺术

下一篇：python - 使用多相机的 3D 点投影

使用 `numpy.fromiter`