lstm - 运行时错误 : Expected hidden[0] size (2, 20, 256), 得到 (2, 50, 256)

标签 lstm pytorch recurrent-neural-network

我在尝试使用 LSTM (RNN) 构建多类文本分类网络时遇到此错误。该代码似乎在代码的训练部分运行良好,而在验证部分抛出错误。下面是网络架构和训练代码。感谢这里的任何帮助。

我尝试采用使用 RNN 预测情绪的现有代码,并最终将 sigmoid 替换为 softmax 函数,并将损失函数从 BCE Loss 替换为 NLLLoss()

  def forward(self, x, hidden):
    """
    Perform a forward pass of our model on some input and hidden state.
    """
    batch_size = x.size(0)
    embeds = self.embedding(x)
    lstm_out,hidden= self.lstm(embeds,hidden)

     # stack up lstm outputs
    lstm_out = lstm_out.contiguous().view(-1, self.hidden_dim)

    # dropout and fully-connected layer
    out = self.dropout(lstm_out)
    out = self.fc(out)

    # softmax function
    soft_out = self.sof(out)

    # reshape to be batch_size first
    soft_out = soft_out.view(batch_size, -1)
#         soft_out = soft_out[:, -1] # get last batch of labels

    # return last sigmoid output and hidden state
    return soft_out, hidden


def init_hidden(self, batch_size):
    ''' Initializes hidden state '''
    # Create two new tensors with sizes n_layers x batch_size x hidden_dim,
    # initialized to zero, for hidden state and cell state of LSTM
    weight = next(self.parameters()).data

    if (train_on_gpu):
        hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda(),
              weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda())
    else:
        hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_(),
                  weight.new(self.n_layers, batch_size, self.hidden_dim).zero_())

    return hidden
# Instantiate the model w/ hyperparams
vocab_size = len(vocab_to_int)+1
output_size = 44
embedding_dim = 100
hidden_dim = 256
n_layers = 2

net = ClassificationRNN(vocab_size, output_size, embedding_dim, hidden_dim, n_layers)

print(net)
# loss and optimization functions
lr=0.001

criterion = nn.NLLLoss()

optimizer = torch.optim.Adam(net.parameters(), lr=lr)

# training params

epochs = 4 # 3-4 is approx where I noticed the validation loss stop decreasing

counter = 0
print_every = 100
clip=5 # gradient clipping

# move model to GPU, if available
if(train_on_gpu):
    net.cuda()

net.train()
# train for some number of epochs
for e in range(epochs):
    # initialize hidden state
    h = net.init_hidden(batch_size)

    # batch loop
    for inputs, labels in train_loader:
        counter += 1

        if(train_on_gpu):
            inputs, labels = inputs.cuda(), labels.cuda()

        # Creating new variables for the hidden state, otherwise
        # we'd backprop through the entire training history
        h = tuple([each.data for each in h])

        # zero accumulated gradients
        net.zero_grad()

        # get the output from the model
        output, h = net(inputs, h)

#         print('output:',output.squeeze())
#         print('labels:',labels.float())

        # calculate the loss and perform backprop
        loss = criterion(output, labels)
        loss.backward()
        # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
        nn.utils.clip_grad_norm_(net.parameters(), clip)
        optimizer.step()

        # loss stats
        if counter % print_every == 0:
            # Get validation loss
            val_h = net.init_hidden(batch_size)
            val_losses = []
            net.eval()
            for inputs, labels in valid_loader:

                # Creating new variables for the hidden state, otherwise
                # we'd backprop through the entire training history
                val_h = tuple([each.data for each in val_h])

                if(train_on_gpu):
                    inputs, labels = inputs.cuda(), labels.cuda()

                output, val_h = net(inputs, val_h)

                val_loss = criterion(output, labels)

                val_losses.append(val_loss.item())

            net.train()
            print("Epoch: {}/{}...".format(e+1, epochs),
                  "Step: {}...".format(counter),
                  "Loss: {:.6f}...".format(loss.item()),
                  "Val Loss: {:.6f}".format(np.mean(val_losses)))
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-41-805ed880b453> in <module>()
     58                     inputs, labels = inputs.cuda(), labels.cuda()
     59 
---> 60                 output, val_h = net(inputs, val_h)
     61 
     62                 val_loss = criterion(output, labels)

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    487             result = self._slow_forward(*input, **kwargs)
    488         else:
--> 489             result = self.forward(*input, **kwargs)
    490         for hook in self._forward_hooks.values():
    491             hook_result = hook(self, input, result)

<ipython-input-38-dbfb8d384231> in forward(self, x, hidden)
     34         batch_size = x.size(0)
     35         embeds = self.embedding(x)
---> 36         lstm_out,hidden= self.lstm(embeds,hidden)
     37 
     38          # stack up lstm outputs

最佳答案

尝试添加 drop_last=True在使用 DataLoader 加载数据的代码行中,
例如从数据集 train_data 加载训练数据:

train_loader = DataLoader(train_data, shuffle=True, batch_size=batch_size, drop_last=True)
说明 :
该错误可能是由于您的训练数据不能被批量大小整除造成的。假设您的训练数据有 130 个项目,批大小为 8,最后一批将只有 2 个(剩余的 130/8)个项目。因此通过设置 drop_lastTrue ,这 2 项将被忽略。

关于lstm - 运行时错误 : Expected hidden[0] size (2, 20, 256), 得到 (2, 50, 256),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54878904/

相关文章:

python - 减小 Keras LSTM 模型的大小

machine-learning - 如何让 pytorch 读取 numpy 格式?

python - pytorch中自定义数据集的数据预处理(transform.Normalize)

python - TensorFlow中的参数 "state_is_tuple"是干什么用的?

python - 在 TensorFlow 中使用 LSTM-CGAN 生成 MNIST 数字

python - 在 tensorflow 中使用 LSTM RNN 进行分类,ValueError : Shape (1, 10, 5) 必须具有等级 2

python - 为什么 LSTM 模型在多个模型运行中产生不同的预测?

python - 神经网络 : estimating sine wave frequency

deep-learning - RNN/LSTM单元解读

python - 有没有办法使用 TRAINS python 包创建一个比较超参数与模型精度的图表?