nlp - 如何理解 Bert 模型中返回的隐藏状态?(拥抱脸转换器)

标签 nlp pytorch huggingface-transformers bert-language-model electrate

Returns last_hidden_state (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size)): Sequence of hidden-states at the output of the last layer of the model.

pooler_output (torch.FloatTensor: of shape (batch_size, hidden_size)): Last layer hidden-state of the first token of the sequence (classification token) further processed by a Linear layer and a Tanh activation function. The Linear layer weights are trained from the next sentence prediction (classification) objective during pre-training.

This output is usually not a good summary of the semantic content of the input, you’re often better with averaging or pooling the sequence of hidden-states for the whole input sequence.

hidden_states (tuple(torch.FloatTensor), optional, returned when config.output_hidden_states=True): Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the initial embedding outputs.

attentions (tuple(torch.FloatTensor), optional, returned when config.output_attentions=True): Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.



这是来自 https://huggingface.co/transformers/model_doc/bert.html#bertmodel .虽然文档中的描述很清楚,但我还是不明白 hidden_​​states 的返回。有一个元组,一个用于嵌入的输出,另一个用于每一层的输出。
请告诉我如何区分它们,或者它们的含义是什么?非常感谢!![眨眼~

最佳答案

hidden_states (tuple(torch.FloatTensor), optional, returned when config.output_hidden_states=True): Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).


Hidden-states of the model at the output of each layer plus the initial embedding outputs.


对于给定的标记,其输入表示是通过对相应的标记嵌入、段嵌入和位置嵌入求和来构建的。此输入表示称为初始嵌入输出,可在 index 0 中找到。元组 hidden_​​states .
该图解释了如何计算嵌入。
enter image description here
元组中剩余的 12 个元素包含相应隐藏层的输出。例如:最后一个隐藏层可以在 index 12 找到,这是元组中的第 13 项。初始嵌入输出和隐藏状态的维度都是 [batch_size, sequence_length, hidden_size] .比较 的索引会很有用hidden_​​states 使用 BERT 论文中的这张图片自下而上。
enter image description here

关于nlp - 如何理解 Bert 模型中返回的隐藏状态?(拥抱脸转换器),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61323621/

相关文章:

pytorch - PyTorch 中的截断反向传播(代码检查)

python - 坚持下载碎片以从 Huggingface 加载 LLM 模型

nlp - 来自 HuggingFace 的 BertWordPieceTokenizer 与 BertTokenizer

hash - 特征哈希

nlp - 实体情感分析(实体级情感分析)

python - 如何在 Pytorch 中可视化网络?

python - 为什么我们需要在 PyTorch 中调用 zero_grad()?

python - Huggingface Transformer - GPT2 从保存的检查点恢复训练

java - 如何使用java对属于特定区域的字符串进行分类?

python - 如何使用 NLTK 构建词性标注语料库?