python - pytorch 模型中的参数如何不是叶子并且在计算图中?

标签 python machine-learning neural-network pytorch

我正在尝试更新/更改神经网络模型的参数,然后将更新后的神经网络的前向传递放在计算图中(无论我们做了多少更改/更新)。

我尝试了这个想法,但是每当我这样做时,pytorch 都会将我更新的张量(在模型内部)设置为叶子,这会阻止梯度流向我想要接收梯度的网络。它杀死了梯度流,因为叶节点不是我希望它们成为的计算图的一部分(因为它们不是真正的叶)。

我尝试了多种方法,但似乎没有任何效果。我创建了一个自包含的虚拟代码,它打印了我希望有渐变的网络的渐变:

import torch
import torch.nn as nn

import copy

from collections import OrderedDict

# img = torch.randn([8,3,32,32])
# targets = torch.LongTensor([1, 2, 0, 6, 2, 9, 4, 9])
# img = torch.randn([1,3,32,32])
# targets = torch.LongTensor([1])
x = torch.randn(1)
target = 12.0*x**2

criterion = nn.CrossEntropyLoss()

#loss_net = nn.Sequential(OrderedDict([('conv0',nn.Conv2d(in_channels=3,out_channels=10,kernel_size=32))]))
loss_net = nn.Sequential(OrderedDict([('fc0', nn.Linear(in_features=1,out_features=1))]))

hidden = torch.randn(size=(1,1),requires_grad=True)
updater_net = nn.Sequential(OrderedDict([('fc0',nn.Linear(in_features=1,out_features=1))]))
print(f'updater_net.fc0.weight.is_leaf = {updater_net.fc0.weight.is_leaf}')
#
nb_updates = 2
for i in range(nb_updates):
    print(f'i = {i}')
    new_params = copy.deepcopy( loss_net.state_dict() )
    ## w^<t> := f(w^<t-1>,delta^<t-1>)
    for (name, w) in loss_net.named_parameters():
        print(f'name = {name}')
        print(w.size())
        hidden = updater_net(hidden).view(1)
        print(hidden.size())
        #delta = ((hidden**2)*w/2)
        delta = w + hidden
        wt = w + delta
        print(wt.size())
        new_params[name] = wt
        #del loss_net.fc0.weight
        #setattr(loss_net.fc0, 'weight', nn.Parameter( wt ))
        #setattr(loss_net.fc0, 'weight', wt)
        #loss_net.fc0.weight = wt
        #loss_net.fc0.weight = nn.Parameter( wt )
    ##
    loss_net.load_state_dict(new_params)
#
print()
print(f'updater_net.fc0.weight.is_leaf = {updater_net.fc0.weight.is_leaf}')
outputs = loss_net(x)
loss_val = 0.5*(target - outputs)**2
loss_val.backward()
print()
print(f'-- params that dont matter if they have gradients --')
print(f'loss_net.grad = {loss_net.fc0.weight.grad}')
print('-- params we want to have gradients --')
print(f'hidden.grad = {hidden.grad}')
print(f'updater_net.fc0.weight.grad = {updater_net.fc0.weight.grad}')
print(f'updater_net.fc0.bias.grad = {updater_net.fc0.bias.grad}')

如果有人知道如何执行此操作,请给我一个 ping...我将要更新的次数设置为 2,因为更新操作应该在计算图中出现任意次数...所以它必须适用于2.

强相关帖子:
  • 所以:How does one have parameters in a pytorch model not be leafs and be in the computation graph?
  • pytorch 论坛:https://discuss.pytorch.org/t/how-does-one-have-the-parameters-of-a-model-not-be-leafs/70076

  • 交叉发布:
  • 知乎:https://www.quora.com/unanswered/How-does-one-have-parameters-in-a-PyTorch-model-not-be-leaves-and-be-in-the-computation-graph
  • reddit:https://www.reddit.com/r/pytorch/comments/f5gu3g/how_does_one_have_parameters_in_a_pytorch_model/
  • 最佳答案

    不能正常工作,因为命名参数模块被删除。

    似乎这有效:

    import torch
    import torch.nn as nn
    
    from torchviz import make_dot
    
    import copy
    
    from collections import OrderedDict
    
    # img = torch.randn([8,3,32,32])
    # targets = torch.LongTensor([1, 2, 0, 6, 2, 9, 4, 9])
    # img = torch.randn([1,3,32,32])
    # targets = torch.LongTensor([1])
    x = torch.randn(1)
    target = 12.0*x**2
    
    criterion = nn.CrossEntropyLoss()
    
    #loss_net = nn.Sequential(OrderedDict([('conv0',nn.Conv2d(in_channels=3,out_channels=10,kernel_size=32))]))
    loss_net = nn.Sequential(OrderedDict([('fc0', nn.Linear(in_features=1,out_features=1))]))
    
    hidden = torch.randn(size=(1,1),requires_grad=True)
    updater_net = nn.Sequential(OrderedDict([('fc0',nn.Linear(in_features=1,out_features=1))]))
    print(f'updater_net.fc0.weight.is_leaf = {updater_net.fc0.weight.is_leaf}')
    #
    def del_attr(obj, names):
        if len(names) == 1:
            delattr(obj, names[0])
        else:
            del_attr(getattr(obj, names[0]), names[1:])
    def set_attr(obj, names, val):
        if len(names) == 1:
            setattr(obj, names[0], val)
        else:
            set_attr(getattr(obj, names[0]), names[1:], val)
    
    nb_updates = 2
    for i in range(nb_updates):
        print(f'i = {i}')
        new_params = copy.deepcopy( loss_net.state_dict() )
        ## w^<t> := f(w^<t-1>,delta^<t-1>)
        for (name, w) in list(loss_net.named_parameters()):
            hidden = updater_net(hidden).view(1)
            #delta = ((hidden**2)*w/2)
            delta = w + hidden
            wt = w + delta
            del_attr(loss_net, name.split("."))
            set_attr(loss_net, name.split("."), wt)
        ##
    #
    print()
    print(f'updater_net.fc0.weight.is_leaf = {updater_net.fc0.weight.is_leaf}')
    print(f'loss_net.fc0.weight.is_leaf = {loss_net.fc0.weight.is_leaf}')
    outputs = loss_net(x)
    loss_val = 0.5*(target - outputs)**2
    loss_val.backward()
    print()
    print(f'-- params that dont matter if they have gradients --')
    print(f'loss_net.grad = {loss_net.fc0.weight.grad}')
    print('-- params we want to have gradients --')
    print(f'hidden.grad = {hidden.grad}') # None because this is not a leaf, it is overriden in the for loop above.
    print(f'updater_net.fc0.weight.grad = {updater_net.fc0.weight.grad}')
    print(f'updater_net.fc0.bias.grad = {updater_net.fc0.bias.grad}')
    make_dot(loss_val)
    

    输出:
    updater_net.fc0.weight.is_leaf = True
    i = 0
    i = 1
    
    updater_net.fc0.weight.is_leaf = True
    loss_net.fc0.weight.is_leaf = False
    
    -- params that dont matter if they have gradients --
    loss_net.grad = None
    -- params we want to have gradients --
    hidden.grad = None
    updater_net.fc0.weight.grad = tensor([[0.7152]])
    updater_net.fc0.bias.grad = tensor([-7.4249])
    

    致谢:来自 pytorch 团队的强大 albanD:https://discuss.pytorch.org/t/how-does-one-have-the-parameters-of-a-model-not-be-leafs/70076/9?u=pinocchio

    关于python - pytorch 模型中的参数如何不是叶子并且在计算图中?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60271131/

    相关文章:

    python - Django View 中的空 request.POST

    python - 在 Python 中与 unicode 作斗争

    python - 使用 Python/Pandas 根据出现次数选择数据

    python - python中的神经网络非线性时间序列Narx模型

    python - 神经网络中具有不同样本大小的多个输入

    python3.7 使用 asyncio 和 aiohttp 从 URL 中批量提取标题

    仅针对一个标签的 Python 机器学习分类器

    python - Google Cloud Vision Logo 检测 API - 无法识别 Logo

    machine-learning - 人工神经网络 RELU 激活函数和梯度

    python - 使用 networkx 提取所有 k-cores