tensorflow - 为什么 Tensorflow 中的分布策略不支持梯度裁剪?

标签 tensorflow

使用分布策略似乎不支持梯度裁剪
https://github.com/tensorflow/tensorflow/blob/f9f6b4cec2a1bdc5781e4896d80cee1336a2fbab/tensorflow/python/keras/optimizer_v2/optimizer_v2.py#L383

("Gradient clipping in the optimizer " "(by setting clipnorm or clipvalue) is currently " "unsupported when using a distribution strategy.")


这有什么原因吗?我很想通过直接裁剪渐变来定义自定义 def _minimize(strategy, tape, optimizer, loss, trainable_variables):

最佳答案

GitHub 用户 tomerk wrote :

There's two possible places to clip when you have distribution strategies enabled:

  • before gradients get aggregated (usually wrong)
  • after gradients get aggregated (usually right & what people expect)

We want it working w/ the second case (clipping after gradients are aggregated). The issue is the optimizers are written with clipping happening in the code before aggregation does.

We looked into changing this, but it would have required either:

  • api changes that break existing users of optimizer apply_gradients/other non-minimize methods
  • changing the signatures of methods optimizer implementers need to implement, breaking existing custom optimizers

So rather than:

  • quietly doing clipping in the wrong place
  • increasing churn & breaking existing users or existing custom optimizers just for this individual feature

We instead decided to leave this disabled for now. We'll roll support for this into a larger optimizer refactoring that solves a larger set of issues.


这现在是 implemented

关于tensorflow - 为什么 Tensorflow 中的分布策略不支持梯度裁剪?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62619216/

相关文章:

TensorFlow Batch 外积

validation - 如何使用 tf.session.run() 进行测试(不更新网络参数)?

python - 如何在 Tensorflow 中使用动态 rnn 构建解码器?

TensorFlow 不会在 GPU 上运行

python-3.x - 属性错误 : module 'resource' has no attribute 'getpagesize'

machine-learning - 减少误报的最佳策略 : Google's new Object Detection API on Satellite Imagery

python - 删除操作图 tensorflow 以在CPU上运行

java - 通过 Java API 在 Windows 上使用来自 Tensorflow.contrb 的操作

python - 属性错误: module 'tensorflow' has no attribute 'executing_eagerly'

neural-network - tensorflow 。 Cifar10 多 gpu 示例使用更多 gpu 时性能更差