machine-learning - 什么是体重衰减?

我最近开始使用 ML 和 TensorFlow。在经历CIFAR10-tutorial时在网站上我看到一段让我有点困惑的段落:

The usual method for training a network to perform N-way classification is multinomial logistic regression, aka. softmax regression. Softmax regression applies a softmax nonlinearity to the output of the network and calculates the cross-entropy between the normalized predictions and a 1-hot encoding of the label. For regularization, we also apply the usual weight decay losses to all learned variables. The objective function for the model is the sum of the cross entropy loss and all these weight decay terms, as returned by the loss() function.


现在,在上面的文本中,我了解到 loss() 由交叉熵损失(即预测和正确标签值的差异)和权重衰减损失组成。





enter image description here

上式的第二项定义了权重 (theta) 的 L2 正则化。一般是为了避免过拟合而添加的。这会惩罚峰值权重并确保考虑所有输入。 (很少的峰值权重意味着只有那些与之相关的输入才会被考虑用于决策。)

在梯度下降参数更新过程中,上述L2正则化最终意味着每个权重都线性衰减:W_new = (1 - lambda)* W_old + alpha*delta_J/delta_w。这就是为什么它通常被称为权重衰减

