machine-learning - 我应该避免将 L2 正则化与 RMSProp 结合使用吗?

标签 machine-learning neural-network backpropagation

我应该避免将 L2 正则化与 RMSprop 和 NAG 结合使用吗?

L2 正则化项会干扰梯度算法 (RMSprop)?

最诚挚的问候,

最佳答案

似乎有人已经解决了(2018)问题(2017)。

普通自适应梯度(RMSProp、Adagrad、Adam 等)与 L2 正则化不能很好地匹配。

论文链接 [ https://arxiv.org/pdf/1711.05101.pdf]和一些介绍:

In this paper, we show that a major factor of the poor generalization of the most popular adaptive gradient method, Adam, is due to the fact that L2 regularization is not nearly as effective for it as for SGD.

L2 regularization and weight decay are not identical. Contrary to common belief, the two techniques are not equivalent. For SGD, they can be made equivalent by a reparameterization of the weight decay factor based on the learning rate; this is not the case for Adam. In particular, when combined with adaptive gradients, L2 regularization leads to weights with large gradients being regularized less than they would be when using weight decay.

关于machine-learning - 我应该避免将 L2 正则化与 RMSProp 结合使用吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42415319/

相关文章:

algorithm - 实现 HexQ 算法

stream - 两个簇有可能重叠吗?

python - 神经网络 - 输出收敛于 0,python

python - 为 tf.while_loop 的每个时间步计算梯度

recurrent-neural-network - 如何在 PyTorch 中使用 LSTM 进行强化学习?

machine-learning - 计算梯度更新

machine-learning - 使用图像和其他特征进行分类

c++ - SIFT 的空间金字塔匹配 (SPM) 然后输入到 C++ 中的 SVM

c++ - 删除tensorflow层

machine-learning - 如何规范视频游戏机器学习(神经网络)的输入