python - NUMPY 操作 : Memory Efficiency: PYTHON

我有 4GB RAM 的内存限制。我需要在 RAM 中有 2.5 GB 的数据才能执行更多操作

import numpy
a = numpy.random.rand(1000000,100)## This needs to be in memory
b= numpy.random.rand(1,100)
c= a-b #this need to be in code in order to perform next operation
d = numpy.linalg.norm(numpy.asarray(c, dtype = numpy.float32), axiss =1)

在创建 c 时，内存使用量激增，python 被杀死。有没有办法加快这个过程。我在具有 4GB RAM 和单核的 EC2 Ubuntu 上执行此操作。当我在我的 MAC OSX 上执行相同的计算时，它很容易完成，没有任何内存问题，而且花费的时间更少。为什么会这样？

我能想到的一种解决方案是

d =[numpy.sqrt(numpy.dot(i-b,i-b)) for i in a]

我认为这对速度没有好处。

最佳答案

如果创建a不会导致内存问题，并且您不需要保留 a 中的值, 你可以计算 c通过修改 a到位:

a -= b  # Now use `a` instead of `c`.

否则，以较小的 block 或批处理工作的想法是一个很好的想法。使用您的列表理解解决方案，您实际上是在计算 d来自 a和 b以 a 的一行的批量大小.您可以通过使用更大的批量来提高效率。这是一个例子；它包括您的代码(带有一些外观更改)和一个以 d2 为单位计算结果的版本(称为 batch_size ) a 的行.

import numpy as np

#n = 1000000
n = 1000
a = np.random.rand(n,100)  ## This needs to be in memory
b = np.random.rand(1,100)
c = a-b  # this need to be in code in order to perform next operation
d = np.linalg.norm(np.asarray(c), axis=1)

batch_size = 300
# Preallocate the result.
d2 = np.empty(n)
for start in range(0, n, batch_size):
    end = min(start + batch_size, n)
    c2 = a[start:end] - b
    d2[start:end] = np.linalg.norm(c2, axis=1)

关于python - NUMPY 操作 : Memory Efficiency: PYTHON，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/25100309/

python - NUMPY 操作 : Memory Efficiency: PYTHON

上一篇：ubuntu - 50-default.conf 入口含义

下一篇：ruby-on-rails - Rails-Thin 服务器在生产(实时)模式下停止。网站关闭