python - 使用 Keras ImageDataGenerator 时出现内存错误

标签 python tensorflow deep-learning keras

我正在尝试使用带有 TensorFlow 后端的 keras 来预测图像中的特征。具体来说,我正在尝试使用 keras ImageDataGenerator .该模型设置为运行 4 个时期并运行良好,直到第 4 个时期失败并出现 MemoryError。

我在 AWS g2.2xlarge 上运行这个模型实例运行 Ubuntu Server 16.04 LTS (HVM),SSD 卷类型。

训练图像是 256x256 RGB 像素图 block (8 位无符号),训练掩码是 256x256 单波段(8 位无符号)图 block 数据,其中 255 == 感兴趣的特征,0 == 其他所有内容。

以下 3 个函数与此错误相关。

如何解决此内存错误?


def train_model():
        batch_size = 1
        training_imgs = np.lib.format.open_memmap(filename=os.path.join(data_path, 'data.npy'),mode='r+')
        training_masks = np.lib.format.open_memmap(filename=os.path.join(data_path, 'mask.npy'),mode='r+')
        dl_model = create_model()
        print(dl_model.summary())
        model_checkpoint = ModelCheckpoint(os.path.join(data_path,'mod_weight.hdf5'), monitor='loss',verbose=1, save_best_only=True)
        dl_model.fit_generator(generator(training_imgs, training_masks, batch_size), steps_per_epoch=(len(training_imgs)/batch_size), epochs=4,verbose=1,callbacks=[model_checkpoint])

def generator(train_imgs, train_masks=None, batch_size=None):

# Create empty arrays to contain batch of features and labels#

        if train_masks is not None:
                train_imgs_batch = np.zeros((batch_size,y_to_res,x_to_res,bands))
                train_masks_batch = np.zeros((batch_size,y_to_res,x_to_res,1))

                while True:
                        for i in range(batch_size):
                                # choose random index in features
                                index= random.choice(range(len(train_imgs)))
                                train_imgs_batch[i] = train_imgs[index]
                                train_masks_batch[i] = train_masks[index]
                        yield train_imgs_batch, train_masks_batch
        else:
                rec_imgs_batch = np.zeros((batch_size,y_to_res,x_to_res,bands))
                while True:
                        for i in range(batch_size):
                                # choose random index in features
                                index= random.choice(range(len(train_imgs)))
                                rec_imgs_batch[i] = train_imgs[index]
                        yield rec_imgs_batch

def train_generator(train_images,train_masks,batch_size):
        data_gen_args=dict(rotation_range=90.,horizontal_flip=True,vertical_flip=True,rescale=1./255)
        image_datagen = ImageDataGenerator()
        mask_datagen = ImageDataGenerator()
# # Provide the same seed and keyword arguments to the fit and flow methods
        seed = 1
        image_datagen.fit(train_images, augment=True, seed=seed)
        mask_datagen.fit(train_masks, augment=True, seed=seed)
        image_generator = image_datagen.flow(train_images,batch_size=batch_size)
        mask_generator = mask_datagen.flow(train_masks,batch_size=batch_size)
        return zip(image_generator, mask_generator)

以下是模型的输出,详细说明了时期和错误消息:

Epoch 00001: loss improved from inf to 0.01683, saving model to /home/ubuntu/deep_learn/client_data/mod_weight.hdf5
Epoch 2/4
7569/7569 [==============================] - 3394s 448ms/step - loss: 0.0049 - binary_crossentropy: 0.0027 - jaccard_coef_int: 0.9983  

Epoch 00002: loss improved from 0.01683 to 0.00492, saving model to /home/ubuntu/deep_learn/client_data/mod_weight.hdf5
Epoch 3/4
7569/7569 [==============================] - 3394s 448ms/step - loss: 0.0049 - binary_crossentropy: 0.0026 - jaccard_coef_int: 0.9982  

Epoch 00003: loss improved from 0.00492 to 0.00488, saving model to /home/ubuntu/deep_learn/client_data/mod_weight.hdf5
Epoch 4/4
7569/7569 [==============================] - 3394s 448ms/step - loss: 0.0074 - binary_crossentropy: 0.0042 - jaccard_coef_int: 0.9975  

Epoch 00004: loss did not improve
Traceback (most recent call last):
  File "image_rec.py", line 291, in <module>
    train_model()
  File "image_rec.py", line 208, in train_model
    dl_model.fit_generator(train_generator(training_imgs,training_masks,batch_size),steps_per_epoch=1,epochs=1,workers=1)
  File "image_rec.py", line 274, in train_generator
    image_datagen.fit(train_images, augment=True, seed=seed)
  File "/home/ubuntu/pyvirt_test/local/lib/python2.7/site-packages/keras/preprocessing/image.py", line 753, in fit
    x = np.copy(x)
  File "/home/ubuntu/pyvirt_test/local/lib/python2.7/site-packages/numpy/lib/function_base.py", line 1505, in copy
    return array(a, order=order, copy=True)
MemoryError

最佳答案

看来你的问题是因为数据太大了。我可以看到两种解决方案。第一个是通过spark在分布式系统中运行你的代码,我猜你没有这个支持,所以让我们继续另一个。

第二个是我认为可行的。我会切片数据,然后尝试增量地提供模型。我们可以使用 Dask 来做到这一点。这个库可以将数据切片并保存在对象中,然后您可以从磁盘中检索读取的内容,只在您想要的部分。

如果您有一张大小为 100x100 矩阵的图像,我们可以检索每个数组,而无需将 100 个数组加载到内存中。我们可以在内存中逐个加载数组(释放前一个),这将是您的神经网络的输入。

为此,您可以将 np.array 转换为 dask 数组并分配分区。例如:

>>> k = np.random.randn(10,10) # Matrix 10x10
>>> import dask.array as da
>>> k2 = da.from_array(k,chunks = 3)
dask.array<array, shape=(10, 10), dtype=float64, chunksize=(3, 3)>
>>> k2.to_delayed()
array([[Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 0, 0)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 0, 1)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 0, 2)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 0, 3))],
   [Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 1, 0)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 1, 1)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 1, 2)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 1, 3))],
   [Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 2, 0)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 2, 1)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 2, 2)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 2, 3))],
   [Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 3, 0)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 3, 1)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 3, 2)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 3, 3))]],
  dtype=object)

在这里,您可以看到数据是如何保存在对象中的,然后您可以分段检索以提供给您的模型。

要实现此解决方案,您必须在函数中引入一个循环,该循环调用每个分区并为神经网络提供数据以获得增量训练。

有关详细信息,请参阅 Dask 文档

关于python - 使用 Keras ImageDataGenerator 时出现内存错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49458905/

相关文章:

python - 为什么我可以访问 Python 函数外部有条件定义的变量?

python - 将错误栏添加到图例的 Line2D 元素中的标记

c++ - 添加op tensorflow调试

python - 递归 Python 神经网络 - Reshape () 错误

c++ - Caffe ImageData神经网络基础示例解析模型文件失败

python - 删除所有观测值具有相同值的列是否会影响我的模型?

python - python hash函数等效

python - keras model.save() 引发 NotImplementedError

python - Tensorflow的variable_scope()和tf.AUTO_REUSE不会在for循环中重用变量

machine-learning - 使用 Faster R-CNN 进行物体检测