Returns last_hidden_state (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size)): Sequence of hidden-states at the output of the last layer of the model.
pooler_output (torch.FloatTensor: of shape (batch_size, hidden_size)): Last layer hidden-state of the first token of the sequence (classification token) further processed by a Linear layer and a Tanh activation function. The Linear layer weights are trained from the next sentence prediction (classification) objective during pre-training.
This output is usually not a good summary of the semantic content of the input, you’re often better with averaging or pooling the sequence of hidden-states for the whole input sequence.
hidden_states (tuple(torch.FloatTensor), optional, returned when config.output_hidden_states=True): Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (tuple(torch.FloatTensor), optional, returned when config.output_attentions=True): Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
这是来自 https://huggingface.co/transformers/model_doc/bert.html#bertmodel .虽然文档中的描述很清楚,但我还是不明白 hidden_states 的返回。有一个元组,一个用于嵌入的输出,另一个用于每一层的输出。
请告诉我如何区分它们,或者它们的含义是什么?非常感谢!![眨眼~
最佳答案
hidden_states (tuple(torch.FloatTensor), optional, returned when config.output_hidden_states=True): Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
对于给定的标记,其输入表示是通过对相应的标记嵌入、段嵌入和位置嵌入求和来构建的。此输入表示称为初始嵌入输出,可在
index 0
中找到。元组 hidden_states .该图解释了如何计算嵌入。
元组中剩余的 12 个元素包含相应隐藏层的输出。例如:最后一个隐藏层可以在
index 12
找到,这是元组中的第 13 项。初始嵌入输出和隐藏状态的维度都是 [batch_size, sequence_length, hidden_size]
.比较 的索引会很有用hidden_states 使用 BERT 论文中的这张图片自下而上。关于nlp - 如何理解 Bert 模型中返回的隐藏状态?(拥抱脸转换器),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61323621/