python - 将张量移动到 cuda 设备会导致 Pytorch 中的非法内存访问

标签 python deep-learning pytorch google-colaboratory

我正在 Colab 中尝试以下代码段,但导致以下错误。
将张量对象移动到 Cuda 设备是错误的吗?。

import torch
a = torch.Tensor(torch.randn(5,5,5))
# a.device("cuda")
device = torch.device("cuda")
class abc(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.w1 = torch.nn.Linear(5,5)

    def forward(self,x):
        return self.w1(x)
mod = abc()
a.cuda()
mod.to(device)
mod(a)
输出:
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-10-5372fb4d5512> in <module>()
     11         return self.w1(x)
     12 mod = abc()
---> 13 a.cuda()
     14 mod.to(device)
     15 mod(a)

RuntimeError: CUDA error: an illegal memory access was encountered

最佳答案

这在 Google colab 上对我有用:

import torch
a = torch.randn(5,5,5)
a = a.to("cuda") # or just a = torch.randn((5,5,5), device='cuda')

class abc(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.w1 = torch.nn.Linear(5,5)

    def forward(self,x):
        return self.w1(x)
mod = abc()
mod.to("cuda")
mod(a)
输出:
tensor([[[ 1.5691e+00,  8.0326e-01,  1.4352e+00,  7.3295e-01,  3.2156e-01],
         [ 5.1630e-01, -2.2816e-03,  7.1052e-01,  1.9250e-01,  8.3110e-01],
         [ 7.6572e-01, -8.9701e-01,  2.7974e-01,  7.4309e-04,  9.5218e-01],
         [ 2.0723e-01, -1.0049e+00,  1.6938e+00,  1.0019e+00,  7.9305e-01],
         [-1.0973e-02, -1.1260e-01,  1.0521e+00, -1.3839e-01, -4.2380e-01]],

        [[ 1.3870e+00,  1.1620e+00, -3.6523e-01, -5.6704e-01,  4.2481e-01],
         [ 1.6204e-01,  8.3231e-02, -5.9607e-01, -1.0912e+00, -6.1651e-01],
         [ 2.3584e-01, -5.9825e-01,  1.1670e+00,  9.3185e-01,  4.0269e-01],
         [ 1.3120e+00,  1.3967e-01, -5.5048e-01, -9.8143e-01,  3.5059e-01],
         [ 8.0019e-01, -1.8983e-02,  2.3792e-01, -5.9157e-01,  3.5816e-01]],

        [[ 3.9709e-01, -8.7349e-01, -2.9742e-01, -3.8732e-01, -1.7191e-03],
         [-8.7056e-01, -8.8214e-01,  1.0647e+00,  7.7785e-01,  6.3816e-01],
         [ 7.4920e-01, -4.0143e-01,  5.9780e-01,  2.7842e-01,  8.1991e-01],
         [-5.9389e-02, -4.9465e-01, -3.7322e-03, -7.0475e-01, -2.5955e-01],
         [ 1.5722e+00,  6.4410e-01, -5.1310e-02, -1.2716e+00, -1.4607e-01]],

        [[ 6.5152e-02, -6.8772e-01,  1.0366e+00, -2.4278e-01, -2.7106e-01],
         [ 7.0832e-01,  1.4581e-01,  1.9924e-01, -4.1930e-01,  4.0567e-01],
         [ 3.9120e-01, -1.0099e+00,  1.6907e+00,  7.2674e-01,  6.5285e-01],
         [-1.3191e-01, -8.6324e-01, -1.2734e-01, -5.6135e-01, -4.1949e-01],
         [ 5.4183e-02, -5.6837e-01,  5.1347e-02, -5.3199e-01,  2.2167e-01]],

        [[ 9.9237e-02, -5.8725e-01, -3.3042e-01, -8.7371e-01, -2.3261e-01],
         [ 5.5485e-01, -3.5022e-01,  1.1516e-01,  3.8139e-02,  4.6032e-01],
         [-7.5111e-01, -9.7203e-01,  1.7809e-01,  2.2506e-01,  3.6540e-02],
         [ 2.5590e-01,  3.0592e-01,  6.8972e-01,  1.8452e-01,  6.7794e-01],
         [-7.6091e-01, -1.3956e+00,  7.8801e-01, -1.7489e-01, -1.0143e+00]]],
       device='cuda:0', grad_fn=<AddBackward0>)

关于python - 将张量移动到 cuda 设备会导致 Pytorch 中的非法内存访问,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63951247/

相关文章:

python - Flask-SQLAlchemy 引发未知数据库错误

tensorflow - Keras - 自定义指标在控制台与 model.evaluate() 中产生不同的值

machine-learning - 神经网络是否很复杂 "Linear Functions"有很多参数?

neural-network - 任何监控神经网络训练的pytorch工具?

python - 在 Pytorch 中按元素应用 Kullback-Leibler(又名 kl 散度)

python - 如何查询名称包含python列表中任何单词的模型?

Python 无法正确比较我分配的变量

pytorch - 使用 PyTorch 数据加载器获取文件名和文件路径

python - 什么相当于 pytorch 在 tensorflow 中的 torch.nn.CosineEmbeddingLoss?

使用对另一个列表的引用的 Python 列表构造函数