python - 在学习期间将数据添加到自动编码器中的解码器

标签 python tensorflow keras keras-layer tensor

我想使用 Keras 实现一个自动编码器,这个结构是一个大型网络,一些操作是在自动编码器的输出上完成的,然后我们应该考虑两个损失我附上了一张显示我提出的结构的图像。链接如下。

autoencoder structure

w 与输入图像大小相同,在这个自动编码器中,我不使用最大池化,因此每个阶段的输出与输入图像大小相同。我想将 w 和潜在空间表示发送到解码器部分,然后在向解码器输出添加噪声后尝试使用网络的第三部分提取 w。所以我需要我的损失函数考虑输入图像和潜在空间表示之间以及 w 和 w' 之间的差异。但我在实现方面遇到了几个问题。我不知道如何将 w 添加到解码器输出,因为使用了这一行“merge_encoded_w=cv2.merge(encoded,w) “产生错误并且不起作用。我不确定我的损失函数是否基于我需要的是真的?请帮助我使用这段代码。我是初学者,找到解决方案对我来说很难。我问了这个问题之前,但没有人帮助我。请指导我。我的代码如下:

from keras.models import Sequential
from keras.layers import Input, Dense, Dropout, Activation,UpSampling2D,Conv2D, MaxPooling2D, GaussianNoise
from keras.models import Model
from keras.optimizers import SGD
from keras.datasets import mnist
from keras import regularizers
from keras import backend as K
import keras as k
import numpy as np
import matplotlib.pyplot as plt
import cv2
from time import time
from keras.callbacks import TensorBoard
# Embedding phase
##encoder

w=np.random.random((1, 28,28))
input_img = Input(shape=(28, 28, 1))  # adapt this if using `channels_first` image data format

x = Conv2D(8, (5, 5), activation='relu', padding='same')(input_img)
#x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(4, (3, 3), activation='relu', padding='same')(x)
#x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(2, (3, 3), activation='relu', padding='same')(x)
encoded = Conv2D(1, (3, 3), activation='relu', padding='same')(x)
merge_encoded_w=cv2.merge(encoded,w)
#
#decoder

x = Conv2D(2, (5, 5), activation='relu', padding='same')(merge_encoded_w)
#x = UpSampling2D((2, 2))(x)
x = Conv2D(4, (3, 3), activation='relu', padding='same')(x)
#x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu',padding='same')(x)
#x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

#Extraction phase
decodedWithNois=k.layers.GaussianNoise(0.5)(decoded)
x = Conv2D(8, (5, 5), activation='relu', padding='same')(decodedWithNois)
#x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(4, (3, 3), activation='relu', padding='same')(x)
#x = MaxPooling2D((2, 2), padding='same')(x)
final_image_watermark = Conv2D(2, (3, 3), activation='relu', padding='same')(x)


autoencoder = Model([input_img,w], [decoded,final_image_watermark(2)])
encoder=Model(input_img,encoded)
autoencoder.compile(optimizer='adadelta', loss=['mean_squared_error','mean_squared_error'],metrics=['accuracy'])
(x_train, _), (x_test, _) = mnist.load_data()
x_validation=x_train[1:10000,:,:]
x_train=x_train[10001:60000,:,:]
#
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_validation = x_validation.astype('float32') / 255.
x_train = np.reshape(x_train, (len(x_train), 28, 28, 1))  # adapt this if using `channels_first` image data format
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))  # adapt this if using `channels_first` image data format
x_validation = np.reshape(x_validation, (len(x_validation), 28, 28, 1))  # adapt this if using `channels_first` image data format
autoencoder.fit(x_train, x_train,
                epochs=5,
                batch_size=128,
                shuffle=True,
                validation_data=(x_validation, x_validation),
                callbacks=[TensorBoard(log_dir='/tmp/autoencoder')])

decoded_imgs = autoencoder.predict(x_test)
encoded_imgs=encoder.predict(x_test)

最佳答案

对于这种大型架构,我建议您先从小块开始构建,然后再将这些小块组合在一起。首先,编码器部分。它接收大小为 (28,28,1) 的图像并返回形状为 (28,28,1) 的编码图像。

from keras.layers import Input, Concatenate, GaussianNoise
from keras.layers import Conv2D
from keras.models import Model

def make_encoder():
    image = Input((28, 28, 1))
    x = Conv2D(8, (5, 5), activation='relu', padding='same')(image)
    x = Conv2D(4, (3, 3), activation='relu', padding='same')(x)
    x = Conv2D(2, (3, 3), activation='relu', padding='same')(x)
    encoded =  Conv2D(1, (3, 3), activation='relu', padding='same')(x)

    return Model(inputs=image, outputs=encoded)
encoder = make_encoder()
encoder.summary()

#_________________________________________________________________
#Layer (type)                 Output Shape              Param #   
#=================================================================
#input_1 (InputLayer)         (None, 28, 28, 1)         0         
#_________________________________________________________________
#conv2d_1 (Conv2D)            (None, 28, 28, 8)         208       
_________________________________________________________________
#conv2d_2 (Conv2D)            (None, 28, 28, 4)         292       
#_________________________________________________________________
#conv2d_3 (Conv2D)            (None, 28, 28, 2)         74        
#_________________________________________________________________
#conv2d_4 (Conv2D)            (None, 28, 28, 1)         19        
#=================================================================
#Total params: 593
#Trainable params: 593
#Non-trainable params: 0
#_________________________________________________________________

形状转换符合理论。
接下来,解码器部分将编码与另一个数组合并,形状为(28, 28, 2),最后恢复原始图像,形状为(28, 28, 1)。

def make_decoder():
    encoded_merged = Input((28, 28, 2))
    x = Conv2D(2, (5, 5), activation='relu', padding='same')(encoded_merged)
    x = Conv2D(4, (3, 3), activation='relu', padding='same')(x)
    x = Conv2D(8, (3, 3), activation='relu',padding='same')(x)
    decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x) 

    return Model(inputs=encoded_merged, outputs=decoded)
decoder = make_decoder()
decoder.summary()

#_________________________________________________________________
#Layer (type)                 Output Shape              Param #   
#=================================================================
#input_2 (InputLayer)         (None, 28, 28, 2)         0         
#_________________________________________________________________
#conv2d_5 (Conv2D)            (None, 28, 28, 2)         102       
#_________________________________________________________________
#conv2d_6 (Conv2D)            (None, 28, 28, 4)         76        
#_________________________________________________________________
#conv2d_7 (Conv2D)            (None, 28, 28, 8)         296       
#_________________________________________________________________
#conv2d_8 (Conv2D)            (None, 28, 28, 1)         73        
#=================================================================
#Total params: 547
#Trainable params: 547
#Non-trainable params: 0
#_________________________________________________________________

然后模型也尝试恢复 W 数组。输入是重建图像加上噪声(形状是 (28, 28, 1))。

def make_w_predictor():
    decoded_noise = Input((28, 28, 1))
    x = Conv2D(8, (5, 5), activation='relu', padding='same')(decoded_noise)
    x = Conv2D(4, (3, 3), activation='relu', padding='same')(x)
    pred_w = Conv2D(1, (3, 3), activation='relu', padding='same')(x)  
    # reconsider activation (is W positive?)
    # should be filter=1 to match W
    return Model(inputs=decoded_noise, outputs=pred_w)

w_predictor = make_w_predictor()
w_predictor.summary()

#_________________________________________________________________
#Layer (type)                 Output Shape              Param #   
#=================================================================
#input_3 (InputLayer)         (None, 28, 28, 1)         0         
#_________________________________________________________________
#conv2d_9 (Conv2D)            (None, 28, 28, 8)         208       
#_________________________________________________________________
#conv2d_10 (Conv2D)           (None, 28, 28, 4)         292       
#_________________________________________________________________
#conv2d_11 (Conv2D)           (None, 28, 28, 1)         37        
#=================================================================
#Total params: 537
#Trainable params: 537
#Non-trainable params: 0
#_________________________________________________________________

有了所有的零件,将零件组合起来构建整个模型并不难。请注意,您在上面构建的模型可以像图层一样使用。

def put_together(encoder, decoder, w_predictor):
    image = Input((28, 28, 1))
    w = Input((28, 28, 1))
    encoded = encoder(image)

    encoded_merged = Concatenate(axis=3)([encoded, w])
    decoded = decoder(encoded_merged)

    decoded_noise = GaussianNoise(0.5)(decoded)
    pred_w = w_predictor(decoded_noise)

    return Model(inputs=[image, w], outputs=[decoded, pred_w])

model = put_together(encoder, decoder, w_predictor)
model.summary()

#__________________________________________________________________________________________________
#Layer (type)                    Output Shape         Param #     Connected to                     
#==================================================================================================
#input_4 (InputLayer)            (None, 28, 28, 1)    0                                            
#__________________________________________________________________________________________________
#model_1 (Model)                 (None, 28, 28, 1)    593         input_4[0][0]                    
#__________________________________________________________________________________________________
#input_5 (InputLayer)            (None, 28, 28, 1)    0                                            
#__________________________________________________________________________________________________
#concatenate_1 (Concatenate)     (None, 28, 28, 2)    0           model_1[1][0]                    
#                                                                 input_5[0][0]                    
#__________________________________________________________________________________________________
#model_2 (Model)                 (None, 28, 28, 1)    547         concatenate_1[0][0]              
#__________________________________________________________________________________________________
#gaussian_noise_1 (GaussianNoise (None, 28, 28, 1)    0           model_2[1][0]                    
#__________________________________________________________________________________________________
#model_3 (Model)                 (None, 28, 28, 1)    537         gaussian_noise_1[0][0]           
#==================================================================================================
#Total params: 1,677
#Trainable params: 1,677
#Non-trainable params: 0
#__________________________________________________________________________________________________

下面的代码使用虚拟数据训练模型。当然你也可以用自己的,只要形状符合就行。

import numpy as np

# dummy data
images = np.random.random((1000, 28, 28, 1))
w = np.random.lognormal(size=(1000, 28, 28, 1))

# is accuracy sensible metric for this model?
model.compile(optimizer='adadelta', loss='mse', metrics=['accuracy'])
model.fit([images, w], [images, w], batch_size=64, epochs=5)

编辑如下

I have some questions about the code that you put here. in the make_w_ predictor, you said:" # reconsider activation (is W positive?) # should be filter=1 to match W" what does it mean? W is an array that contains 0 and 1. what does it mean " reconsider activation" should I change the code for this part?

relu 激活在 [0, +inf) 中返回正数,因此如果 W 取不同的值集可能不是一个好的选择。典型的选择如下。

  • W 可以是正数和负数:“线性”激活。
  • W 在 [0, 1] 中:“sigmoid”激活。
  • W 在 [-1, 1] 中:“tanh”激活。
  • W为正数:“relu”激活。

在原始代码中,您有:

w=np.random.random((1, 28, 28))

取值介于 0 和 1 之间。所以我建议从“relu”切换到“sigmoid”。但我没有更改我的代码示例,因为我不确定这是否有意。

you said the filter should be 1 it means change (3,3) to (1,1)? I am so sorry for these questions. but I am a beginner and I can not find some of these that you say. can you please help me and explain me completely.

我指的是原始问题中的这一行:

final_image_watermark = Conv2D(2, (3, 3), activation='relu', padding='same')(x)

如果我理解正确,这在附加图像中定义了 W',它应该预测 W 并且它的大小是 (28, 28, 1)。然后 Conv2D 的第一个参数应该是一个。否则输出形状变为 (28, 28, 2)。我在我的代码示例中进行了此更改,否则它会发出形状不匹配错误:

pred_w = Conv2D(1, (3, 3), activation='relu', padding='same')(x)

我认为 keras 中的 (3, 3) 部分,kernel size 是可以的。

关于python - 在学习期间将数据添加到自动编码器中的解码器,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52337636/

相关文章:

python - 在单个 session 中从 python 运行 bash 命令

python - 跨组的 Pyspark 示例数据框

python - TensorFlow 2.0 中的 tf.contrib.layers.recompute_grad 在哪里?

python - 我可以使用新目标再次加载和训练 Keras 模型吗?

tensorflow - LSTM 中节点数量的影响

python - 如何在 Keras 中使用 deconv2d 获得与原始输入大小相同的层?

python - 操作顺序不正确?

python - Django 是否在更新时使查询缓存无效?

python - tf.global_variables_initializer() 的位置

neural-network - 微调resnet50时如何卡住一些图层