python - 如何在自定义 Keras 模型函数中共享层权重

标签 python tensorflow keras deep-learning

我想共享连体模型两侧的权重。

给定两个输入集,每个输入集都应该通过具有相同权重的完全相同的模型函数(连体部分)。然后两个输出连接在一起作为输出。

我已经了解了如何共享文档中的特定层( https://keras.io/getting-started/functional-api-guide/#shared-layers )以及该板上的其他问题。它有效。

但是当我创建自己的多层模型函数时,Keras 不会共享权重。

这是一个最小的例子:

from keras.layers import Input, Dense, concatenate
from keras.models import Model

# Define inputs
input_a = Input(shape=(16,), dtype='float32')
input_b = Input(shape=(16,), dtype='float32')

# My simple model
def my_model(x):
    x = Dense(128, input_shape=(x.shape[1],), activation='relu')(x)
    x = Dense(128, activation='relu')(x)
    return x

# Instantiate model parameters to share
processed_a = my_model(input_a)
processed_b = my_model(input_b)

# Concatenate output vector
final_output = concatenate([processed_a, processed_b], axis=-1)

model = Model(inputs=[input_a, input_b], outputs=final_output)

此模型如果共享,总共应有 (16*128 + 128) + (128*128 + 128) 个参数 = 18688 个参数。如果我们检查一下:

model.summary()

这表明我们有双重:

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_3 (InputLayer)            (None, 16)           0                                            
__________________________________________________________________________________________________
input_4 (InputLayer)            (None, 16)           0                                            
__________________________________________________________________________________________________
dense_5 (Dense)                 (None, 128)          2176        input_3[0][0]                    
__________________________________________________________________________________________________
dense_7 (Dense)                 (None, 128)          2176        input_4[0][0]                    
__________________________________________________________________________________________________
dense_6 (Dense)                 (None, 128)          16512       dense_5[0][0]                    
__________________________________________________________________________________________________
dense_8 (Dense)                 (None, 128)          16512       dense_7[0][0]                    
__________________________________________________________________________________________________
concatenate_2 (Concatenate)     (None, 256)          0           dense_6[0][0]                    
                                                                 dense_8[0][0]                    
==================================================================================================
Total params: 37,376
Trainable params: 37,376
Non-trainable params: 0
__________________________________________________________________________________________________

我不知道我做错了什么。这是一个简化的示例。我的示例首先加载预训练的语言模型并将文本输入编码/处理为向量,然后应用此暹罗模型。由于是预训练模型,因此最好将模型放在像这样的单独函数中。

谢谢。

最佳答案

问题是,当您调用 my_model 时,您正在创建全新的层(即每次都初始化 Dense 层)。你想做的只是每层初始化一次。看起来像这样:

from keras.layers import Input, Dense, concatenate
from keras.models import Model

# Define inputs
input_a = Input(shape=(16,), dtype='float32')
input_b = Input(shape=(16,), dtype='float32')

# Instantiate model parameters to share
layer1 = Dense(128, input_shape=(input_a.shape[1],), activation='relu')
layer2 = Dense(128, activation='relu')
processed_a = layer2(layer1(input_a))
processed_b = layer2(layer1(input_b))

# Concatenate output vector
final_output = concatenate([processed_a, processed_b], axis=-1)

model = Model(inputs=[input_a, input_b], outputs=final_output)

现在 model.summary() 给出:

Model: "model_2"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_5 (InputLayer)            (None, 16)           0                                            
__________________________________________________________________________________________________
input_6 (InputLayer)            (None, 16)           0                                            
__________________________________________________________________________________________________
dense_5 (Dense)                 (None, 128)          2176        input_5[0][0]                    
                                                                 input_6[0][0]                    
__________________________________________________________________________________________________
dense_6 (Dense)                 (None, 128)          16512       dense_5[0][0]                    
                                                                 dense_5[1][0]                    
__________________________________________________________________________________________________
concatenate_2 (Concatenate)     (None, 256)          0           dense_6[0][0]                    
                                                                 dense_6[1][0]                    
==================================================================================================
Total params: 18,688
Trainable params: 18,688
Non-trainable params: 0

编辑:如果您只想在函数内创建一次图层,则应使用类似下面的内容

# Instantiate model parameters to share
def my_model(x):
    return Sequential([Dense(128, input_shape=(x.shape[1],), activation='relu'),
                      Dense(128, activation='relu')])
# create sequential model (and layers) only once
model = my_model(input_a)
processed_a = model(input_a)
processed_b = model(input_b)

关于python - 如何在自定义 Keras 模型函数中共享层权重,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59571518/

相关文章:

python - 包含计数和平均值的嵌套 Group by

python - 什么会导致 tensorflow 导入这么慢?

python - 将keras后端从tensorflow cpu更改为gpu

python - 如何从父目录导入模块? (单元测试目的)

python - SQLAlchemy ORM 从子查询中选择多个实体

python - 自引用属性问题

python - EfficientDet-Custom数据集-StringToNumberOp无法正确转换字符串

tensorflow - 如何正确使用变分循环 dropout

python - 将输入层与另一个序列模型一起传递到自定义损失函数中

tensorflow - 在 Tensorflow 中创建加权 MSE 损失函数