tensorflow - 如何在TensorFlow GRU模型中添加Attention层？

我使用 TensorFlow Function API 创建了一个语言翻译模型。

这是模型

# encoder 
encoder = tf.keras.Input(shape=(200, ))
enc_embd = tf.keras.layers.Embedding(vocab_train, embedding_dim)(encoder)
encoder_gru = tf.keras.layers.GRU(units, return_sequences=True, return_state=True)
output_e, hidden_e = encoder_gru(enc_embd)

# decoder
decoder = tf.keras.Input(shape=(200, ))
dec_embd = tf.keras.layers.Embedding(vocab_label, embedding_dim)(decoder)
decoder_gru = tf.keras.layers.GRU(units, return_sequences=True, return_state=True)
output_d, hidden_d = decoder_gru(dec_embd, initial_state = hidden_e)
final_output = tf.keras.layers.Dense(vocab_label, activation='softmax')
output_f = final_output(output_d)

我想问一下，如何在编码器和解码器之间添加完全连接的tf.keras.layers.Attention(注意力层)？

最佳答案

您可以在output_e和output_d之间使用Attention层。下面是一个完整的示例，我们创建一个自动编码器，为编码器和解码器构建模型，然后合并在一起。

定义参数和虚拟数据:

vocab_train = 111
vocab_label = 123
embedding_dim = 64
units = 32
n_sample = 10
seq_length = 200

X_enc = np.random.randint(0,vocab_train, (n_sample,seq_length))
X_dec = np.random.randint(0,vocab_label, (n_sample,seq_length))
y = np.random.randint(0,2, (n_sample,seq_length,vocab_label))

定义编码器(它还必须返回hidden_e，因为它由解码器使用):

encoder = tf.keras.Input(shape=(seq_length, ))
enc_embd = tf.keras.layers.Embedding(vocab_train, embedding_dim)(encoder)
encoder_gru = tf.keras.layers.GRU(units, return_sequences=True, return_state=True)
output_e, hidden_e = encoder_gru(enc_embd)

enc = Model(encoder, [hidden_e, output_e])

使用Attention定义解码器(它也接收output_e和hidden_e作为输入):

decoder = tf.keras.Input(shape=(seq_length, ))
hidden_e_input = tf.keras.Input(shape=(units, ))
output_e_input = tf.keras.Input(shape=(seq_length, units))
dec_embd = tf.keras.layers.Embedding(vocab_label, embedding_dim)(decoder)
decoder_gru = tf.keras.layers.GRU(units, return_sequences=True, return_state=True)
output_d, hidden_d = decoder_gru(dec_embd, initial_state = hidden_e_input)
att = tf.keras.layers.Attention()([output_e_input, output_d])
concat = tf.keras.layers.Concatenate()([att, output_d])
final_output = tf.keras.layers.Dense(vocab_label, activation='softmax')(concat)

dec = Model([decoder, hidden_e_input, output_e_input], final_output)

组合编码器和解码器:

inp_e = tf.keras.Input(shape=(seq_length, ))
h_e, o_e = enc(inp_e)
inp_d = tf.keras.Input(shape=(seq_length, ))
out = dec([inp_d, h_e, o_e])

ae = Model([inp_e, inp_d], out)
ae.compile('adam', 'categorical_crossentropy')
ae.fit([X_enc, X_dec], y, epochs=3)

关于tensorflow - 如何在TensorFlow GRU模型中添加Attention层？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/68280503/

tensorflow - 如何在TensorFlow GRU模型中添加Attention层？

上一篇：go - 对 byte slice 进行更快的按位与运算

下一篇：python - GCP Bigquery 未提供错误流中的所有不良记录