python - Pytorch (1.0) 中类似外观操作的不同 `grad_fn`

标签 python pytorch attention-model

我正在研究一个注意力模型，在运行最终模型之前，我正在研究流经代码的张量形状。我有一个需要 reshape 张量的操作。张量的形状为 torch.Size([[30, 8, 9, 64]])在哪里 30是 batch_size , 8是注意力头的数量(这与我的问题无关)9是句子中的单词数，64是单词的一些中间嵌入表示。我必须将张量 reshape 为 torch.size([30, 9, 512]) 的大小在进一步处理之前。所以我在网上找了一些引用资料，他们做了以下 x.transpose(1, 2).contiguous().view(30, -1, 512)而我认为这应该可行x.transpose(1, 2).reshape(30, -1, 512) .

在第一种情况下 grad_fn是 <ViewBackward> ，而在我的情况下是 <UnsafeViewBackward> .这两个不是同一个操作吗？这会导致训练错误吗？

最佳答案

Aren't these two the same operations?

不会。虽然它们有效地产生相同的张量，但操作 are not the same ，并且不保证它们具有相同的 storage .
TensorShape.cpp :

// _unsafe_view() differs from view() in that the returned tensor isn't treated
// as a view for the purposes of automatic differentiation. (It's not listed in
// VIEW_FUNCTIONS in gen_autograd.py).  It's only safe to use if the `self` tensor
// is temporary. For example, the viewed tensor here (a + b) is discarded immediately
// after viewing:
//
//  res = at::_unsafe_view(a + b, size);
//
// This is a hack because in-place operations on tensors treated like views
// can be much more expensive than the same operations on non-view tensors.

请注意，如果应用于 complex inputs，这可能会产生错误。，但这通常在 PyTorch 中尚未完全支持，并且不是此功能独有的。

关于python - Pytorch (1.0) 中类似外观操作的不同 `grad_fn`，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/55835557/

上一篇：scala - Gatling exec 与 session

下一篇：macos - GDB 在 macOS Catalina 上卡住

相关文章：

python - 如何在 Python 中让函数在 5 秒后返回？

python - 当类变量被分配为列表时，Python 中的数据类不会引发错误(但会出现键入提示)

python - Pytorch transforms.RandomRotation() 在 Google Colab 上不起作用

python - 使用 Pytorch 前向传播 RNN

python - 向自定义 resnet 18 架构添加简单的注意力层会导致前向传递错误

python - numpy trapz 行为在 x 值处具有双重定义的 y 值(冲突点)

python - 随着时间的推移增加 python 进程内存

python - 如何将输入视为复杂的张量？运行时错误: Tensor must have a last dimension with stride 1

python - 注意层抛出 TypeError : Permute layer does not support masking in Keras

pytorch - 运行时错误: "exp" not implemented for 'torch.LongTensor'