nlp - 如何理解 Bert 模型中返回的隐藏状态？(拥抱脸转换器)

Returns last_hidden_state (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size)): Sequence of hidden-states at the output of the last layer of the model.

pooler_output (torch.FloatTensor: of shape (batch_size, hidden_size)): Last layer hidden-state of the first token of the sequence (classification token) further processed by a Linear layer and a Tanh activation function. The Linear layer weights are trained from the next sentence prediction (classification) objective during pre-training.

This output is usually not a good summary of the semantic content of the input, you’re often better with averaging or pooling the sequence of hidden-states for the whole input sequence.

hidden_states (tuple(torch.FloatTensor), optional, returned when config.output_hidden_states=True): Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the initial embedding outputs.

attentions (tuple(torch.FloatTensor), optional, returned when config.output_attentions=True): Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

这是来自 https://huggingface.co/transformers/model_doc/bert.html#bertmodel .虽然文档中的描述很清楚，但我还是不明白 hidden_states 的返回。有一个元组，一个用于嵌入的输出，另一个用于每一层的输出。
请告诉我如何区分它们，或者它们的含义是什么？非常感谢!![眨眼~

最佳答案

hidden_states (tuple(torch.FloatTensor), optional, returned when config.output_hidden_states=True): Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the initial embedding outputs.

对于给定的标记，其输入表示是通过对相应的标记嵌入、段嵌入和位置嵌入求和来构建的。此输入表示称为初始嵌入输出，可在 index 0 中找到。元组 hidden_states .
该图解释了如何计算嵌入。

元组中剩余的 12 个元素包含相应隐藏层的输出。例如:最后一个隐藏层可以在 index 12 找到，这是元组中的第 13 项。初始嵌入输出和隐藏状态的维度都是 [batch_size, sequence_length, hidden_size] .比较 的索引会很有用hidden_states 使用 BERT 论文中的这张图片自下而上。

关于nlp - 如何理解 Bert 模型中返回的隐藏状态？(拥抱脸转换器)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/61323621/

nlp - 如何理解 Bert 模型中返回的隐藏状态？(拥抱脸转换器)

上一篇：java - 如何使用 Apache POI 对 XSSFTable 列启用排序/过滤？

下一篇：pine-script - 如何将 plot 语句放入 if 语句中