tensorflow - Adam 优化器真的是 RMSprop 加动量吗？如果是，为什么它没有动量参数？

这是一个link tensorflow 优化器。您可以看到，RMSprop 将动量作为参数，而 Adam 没有这样做。所以我很困惑。 Adam 优化假装是具有动量的 RMSprop 优化，如下所示:

Adam = RMSprop + 动量

但是为什么 RMSprop 有动量参数而 Adam 没有呢？

最佳答案

虽然“Adam is RMSProp with Momentum”这一表述确实被广泛使用，但这只是一个非常粗略的简写描述，不应该只看其表面值(value)；已经在原版Adam paper ，明确澄清(第 6 页):

There are a few important differences between RMSProp with momentum and Adam: RMSProp with momentum generates its parameter updates using a momentum on the rescaled gradient, whereas Adam updates are directly estimated using a running average of first and second moment of the gradient.

有时，作者明确表示主题表达只是一个松散的描述，例如在(强烈推荐)Overview of gradient descent optimization algorithms (强调):

Adam also keeps an exponentially decaying average of past gradients mt, similar to momentum.

或在 Stanford CS231n: CNNs for Visual Recognition (再次强调):

Adam is a recently proposed update that looks a bit like RMSProp with momentum.

也就是说，其他一些框架确实包含 Adam 的 momentum 参数，但这实际上是 beta1 参数；这是CNTK :

momentum (float, list, output of momentum_schedule()) – momentum schedule. Note that this is the beta1 parameter in the Adam paper. For additional information, please refer to the this CNTK Wiki article.

所以，不要太从字面上理解，也不要因此而失眠。

关于tensorflow - Adam 优化器真的是 RMSprop 加动量吗？如果是，为什么它没有动量参数？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/61381648/

上一篇：python - 使用公共(public) key 在数据帧上广播系列乘法

下一篇：reactjs - 单击 testcafe 中的到达下拉菜单

相关文章：

php - 如何使用 grunt 缩小我的 index.php？

tensorflow - 语义图像分割神经网络 (DeepLabV3+) 的内存过多问题

python - 为什么BERT中的矩阵叫Query、Key、Value？

python - Tensorflow DNNClassifier 和 scikit-learn GridSearchCV 问题

optimization - 在 Weka 中使用 RBFKernel(C 和 gamma)优化 SMO

java - 分支绑定(bind)背包实现中的内存阻塞

apache-spark - 在pyspark lambda映射函数中使用keras模型

machine-learning - 与继续添加滤波器尺寸增加的层相比，连续添加相同的 CONV2d 层有什么好处

tensorflow - x = tf.placeholder(tf.float32，[None，784])是什么意思？

python - 在神经网络中 : accuracy improvement after each epoch is GREATER than accuracy improvement after each batch. 为什么？