python - 为什么在线性层之后使用 ReLu 激活时精度会降低

所以我开始使用 Pytorch，并在 FashionMNIST 数据集上构建一个非常基本的 CNN。我在使用神经网络时注意到一些奇怪的行为，我不知道为什么会发生这种情况，在前向函数中，当我在每个线性层之后使用 Relu 函数时，神经网络的准确性会降低。

这是我的自定义神经网络的代码:

# custom class neural network 
class FashionMnistClassifier(nn.Module):
  def __init__(self, n_inputs, n_out):
    super().__init__()
    self.cnn1 = nn.Conv2d(n_inputs, out_channels=32, kernel_size=5).cuda(device)
    self.cnn2 = nn.Conv2d(32, out_channels=64, kernel_size=5).cuda(device)
    #self.cnn3 = nn.Conv2d(n_inputs, out_channels=32, kernel_size=5)
    self.fc1 = nn.Linear(64*4*4, out_features=100).cuda(device)
    self.fc2 = nn.Linear(100, out_features=n_out).cuda(device)
    self.relu = nn.ReLU().cuda(device)
    self.pool = nn.MaxPool2d(kernel_size=2).cuda(device)
    self.soft_max = nn.Softmax().cuda(device)

  def forward(self, x):
    x.cuda(device)
    out = self.relu(self.cnn1(x))
    out = self.pool(out)
    out = self.relu(self.cnn2(out))
    out = self.pool(out)
    #print("out shape in classifier forward func: ", out.shape)
    out = self.fc1(out.view(out.size(0), -1))
    #out = self.relu(out) # if I uncomment these then the Accuracy decrease from 90 to 50!!!
    out = self.fc2(out)
    #out = self.relu(out) # this too
    return out

n_batch = 100
n_outputs = 10
LR = 0.001

model = FashionMnistClassifier(1, 10).cuda(device)
optimizer = optim.Adam(model.parameters(), lr=LR)
criterion = nn.CrossEntropyLoss()

因此，如果我仅在 CNN 层之后使用 ReLu，我会得到 90% 的准确度，但是当我取消注释该部分并在线性层之后使用 ReLU 激活时，准确度会下降到 50%，我不知道为什么会发生这种情况因为我认为在每个线性层之后使用激活总是更好，以获得更好的分类准确性。我一直认为，如果我们有分类问题，我们应该始终使用激活函数，而对于线性回归，我们不必这样做，但在我的例子中，虽然这是一个分类问题，但如果我不这样做，我会得到更好的性能不要在线性层之后使用激活函数。有人可以向我解释一下吗？

最佳答案

CrossEntropyLoss 要求您传入非标准化 logits(最后一个 Linear 层的输出)。

如果您使用 ReLU 作为最后一层的输出，您只会输出 [0, inf) 范围内的值，而神经网络往往会使用较小的值错误的标签为高，正确的标签为高(我们可以说它对它的预测过于自信)。哦，argmax 选择具有最高 logit 值的标签作为正确的标签。

所以它肯定不适用于这一行:

# out = self.relu(out) # this too

尽管它前面应该有ReLU。请记住，更多的非线性并不总是对网络有利。

关于python - 为什么在线性层之后使用 ReLu 激活时精度会降低，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58050105/

python - 为什么在线性层之后使用 ReLu 激活时精度会降低

上一篇：python - Keras 无法使用回调来存储检查点

下一篇：python - 下面提到的程序中缺少什么参数？