我应该避免将 L2 正则化与 RMSprop 和 NAG 结合使用吗?
L2 正则化项会干扰梯度算法 (RMSprop)?
最诚挚的问候,
最佳答案
似乎有人已经解决了(2018)问题(2017)。
普通自适应梯度(RMSProp、Adagrad、Adam 等)与 L2 正则化不能很好地匹配。
论文链接 [ https://arxiv.org/pdf/1711.05101.pdf]和一些介绍:
In this paper, we show that a major factor of the poor generalization of the most popular adaptive gradient method, Adam, is due to the fact that L2 regularization is not nearly as effective for it as for SGD.
L2 regularization and weight decay are not identical. Contrary to common belief, the two techniques are not equivalent. For SGD, they can be made equivalent by a reparameterization of the weight decay factor based on the learning rate; this is not the case for Adam. In particular, when combined with adaptive gradients, L2 regularization leads to weights with large gradients being regularized less than they would be when using weight decay.
关于machine-learning - 我应该避免将 L2 正则化与 RMSProp 结合使用吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42415319/