1。将嵌入分成两个单独的对象

一种方法是使用两个单独的嵌入一个用于预训练，另一个用于待训练。

GloVe 应该被卡住，而没有预训练表示的那个将从可训练层中取出。

如果您将数据格式化为预训练 token 表示，它的范围比没有 GloVe 表示的 token 更小，则可以完成。假设您的预训练索引在 [0, 300] 范围内，而没有代表的是 [301, 500]。我会按照这些思路去做:

import numpy as np
import torch


class YourNetwork(torch.nn.Module):
    def __init__(self, glove_embeddings: np.array, how_many_tokens_not_present: int):
        self.pretrained_embedding = torch.nn.Embedding.from_pretrained(glove_embeddings)
        self.trainable_embedding = torch.nn.Embedding(
            how_many_tokens_not_present, glove_embeddings.shape[1]
        )
        # Rest of your network setup

    def forward(self, batch):
        # Which tokens in batch do not have representation, should have indices BIGGER
        # than the pretrained ones, adjust your data creating function accordingly
        mask = batch > self.pretrained_embedding.num_embeddings

        # You may want to optimize it, you could probably get away without copy, though
        # I'm not currently sure how
        pretrained_batch = batch.copy()
        pretrained_batch[mask] = 0

        embedded_batch = self.pretrained_embedding(pretrained_batch)

        # Every token without representation has to be brought into appropriate range
        batch -= self.pretrained_embedding.num_embeddings
        # Zero out the ones which already have pretrained embedding
        batch[~mask] = 0
        non_pretrained_embedded_batch = self.trainable_embedding(batch)

        # And finally change appropriate tokens from placeholder embedding created by
        # pretrained into trainable embeddings.
        embedded_batch[mask] = non_pretrained_embedded_batch[mask]

        # Rest of your code
        ...

假设您的预训练索引在 [0, 300] 范围内，而没有代表的是 [301, 500]。

2。指定标记的零梯度。

这个有点棘手，但我认为它非常简洁且易于实现。因此，如果您获得没有 GloVe 表示的标记的索引，您可以在反向传播之后明确地将它们的梯度归零，这样这些行就不会得到更新。

import torch

embedding = torch.nn.Embedding(10, 3)
X = torch.LongTensor([[1, 2, 4, 5], [4, 3, 2, 9]])

values = embedding(X)
loss = values.mean()

# Use whatever loss you want
loss.backward()

# Let's say those indices in your embedding are pretrained (have GloVe representation)
indices = torch.LongTensor([2, 4, 5])

print("Before zeroing out gradient")
print(embedding.weight.grad)

print("After zeroing out gradient")
embedding.weight.grad[indices] = 0
print(embedding.weight.grad)

第二种方法的输出:

Before zeroing out gradient
tensor([[0.0000, 0.0000, 0.0000],
        [0.0417, 0.0417, 0.0417],
        [0.0833, 0.0833, 0.0833],
        [0.0417, 0.0417, 0.0417],
        [0.0833, 0.0833, 0.0833],
        [0.0417, 0.0417, 0.0417],
        [0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000],
        [0.0417, 0.0417, 0.0417]])
After zeroing out gradient
tensor([[0.0000, 0.0000, 0.0000],
        [0.0417, 0.0417, 0.0417],
        [0.0000, 0.0000, 0.0000],
        [0.0417, 0.0417, 0.0417],
        [0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000],
        [0.0417, 0.0417, 0.0417]])

关于python - 是否可以只卡住 pytorch 嵌入层中的某些嵌入权重？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54924582/

python - 是否可以只卡住 pytorch 嵌入层中的某些嵌入权重？

1。将嵌入分成两个单独的对象

2。指定标记的零梯度。

上一篇：python - 通过 Python 中的索引列表访问 Matrix 的元素，以在没有 for 循环的情况下将 max(val, 0.5) 应用于每个值

下一篇：Python:如何将数组中的值从某个位置移动到另一个位置？