python - 无法释放 numpy 数组消耗的内存

标签 python numpy memory numpy-memmap

我有一组 5 个文件,格式为 .npz。我需要从这些文件中一一提取 numpy 数组,然后用它来训练模型。将第一个 numpy 数组加载到内存中并用它训练模型后,如果我尝试通过切片将其从内存中删除,则消耗的内存并没有减少。因此,我无法加载第二个 numpy 数组并最终得到 MemoryError。

如何确保训练模型后释放内存?

PS:X_test和y_test的大小非常小,可以忽略。

代码:

for person_id in range(1, 5):
     print "Initial memory ",resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
     temp1 = np.load("../final_data/speaker_input" + str(person_id))
     X_train = temp1['arr_0']
     y_train = np.load("../final_data/speaker_final_output" + str(person_id)+ ".npy")
     X_test,y_test = data.Test(person_id=1)
     print "Input dimension ", X_train.shape
     print "Output dimension",y_train.shape
     print "Before training ",resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
     lipreadtrain.train(model=net,X_train=X_train, y_train=y_train,X_test=X_test, y_test=y_test)
     print "After training ", resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
     X_train =  X_train[:1]
     y_train = y_train[:1]
     X_test = X_test[:1]
     y_test = y_test[:1]
     print len(X_train),len(y_train),len(X_test),len(y_test)
     gc.collect()
     temp1.close()

输出:

Initial memory  861116
Input dimension  (8024, 50, 2800)
Output dimension (8024, 53)
Before training  9642152
Training the model, which will take a long long time...
Epoch 1/1
8024/8024 [==============================] - 42s - loss: nan - acc: 0.2316        
----- Training Takes 42.3187870979 Seconds -----
Finished!
After training  9868080
1 1 0 0
Initial memory  9868080
Traceback (most recent call last):
File "test.py", line 21, in <module>
X_train = temp1['arr_0']
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.py",           line 224, in __getitem__
pickle_kwargs=self.pickle_kwargs)
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/numpy/lib/format.py", line 661, in read_array
array = numpy.empty(count, dtype=dtype)
MemoryError

最佳答案

问题在于,切片只是一个 View ,但它会阻塞底层内存,即使你删除或重命名父名称。作为一个例子:

from pylab import *
res=[]
i=0
while True:
    t=zeros(1e10,uint8)
    u=t[:1]
    res.append(u)
    i=i+1
    print(i)

给出:

In [17]: (executing lines 1 to 9 of "<tmp 1>")
1
2
Traceback (most recent call last):
  File "<tmp 1>", line 5, in <module>
    t=zeros(1e10,uint8)
MemoryError

现在,只需制作一个副本:u=t[:1].copy() 而不是 u=t[:1],然后 t 被释放每个循环:

In [18]: (executing lines 1 to 9 of "<tmp 1>")
1
2
3
4
5
....

关于python - 无法释放 numpy 数组消耗的内存,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35957179/

相关文章:

python - Opencv (Python) - 脑肿瘤的异常形状 Blob 检测

python - 在 python 中获得大特征向量的最近 10 个欧几里德邻居的最快方法

python - 查找序列出现的次数

java内存不足然后退出

c++ - 快速内存分配/范围问题

python - Flask 应用程序中 CSS 文件的 404 错误

python - 当我在 tkinter 中使用条目时,我得到空字符串

python - 如何找到阈值内最长的子数组?

C++:指向释放内存空间的指针

python - 如何区分图形或数组中的多个形状?