python - Tensorflow 多 GPU 示例错误 : Variable conv1/weights/ExponentialMovingAverage/does not exist

我正在运行本教程中提到的代码:https://www.tensorflow.org/tutorials/deep_cnn/

我从这里下载了代码:https://github.com/tensorflow/models/tree/master/tutorials/image/cifar10/

我正在 AWS 的 ubuntu 14.04 上的 g2.4xlarge 机器上运行代码。单 GPU 示例运行良好，没有任何错误。

有人可以帮忙解决这个问题吗？我正在运行0.12版本。

ubuntu@ip-xxx-xx-xx-xx:~/pythonworkspace/tensorflowdev/models-master/tutorials/image/cifar10$ python -c '导入tensorflow为tf;打印(tf。版本)'

0.12.head

ubuntu@ip-xxx-xx-xx-xx:~/pythonworkspace/tensorflowdev/models-master/tutorials/image/cifar10$ python cifar10_multi_gpu_train.py --num_gpus=2

>> Downloading cifar-10-binary.tar.gz 100.0%
Successfully downloaded cifar-10-binary.tar.gz 170052171 bytes.
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
WARNING:tensorflow:From /home/ubuntu/pythonworkspace/tensorflowdev/models-master/tutorials/image/cifar10/cifar10_input.py:135: image_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.image. Note that tf.summary.image uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, the max_images argument was renamed to max_outputs.
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
WARNING:tensorflow:From /home/ubuntu/pythonworkspace/tensorflowdev/models-master/tutorials/image/cifar10/cifar10_input.py:135: image_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.image. Note that tf.summary.image uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, the max_images argument was renamed to max_outputs.
Traceback (most recent call last):
  File "cifar10_multi_gpu_train.py", line 273, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "cifar10_multi_gpu_train.py", line 269, in main
    train()
  File "cifar10_multi_gpu_train.py", line 210, in train
    variables_averages_op = variable_averages.apply(tf.trainable_variables())
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/moving_averages.py", line 373, in apply
    colocate_with_primary=True)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 110, in create_slot
    return _create_slot_var(primary, val, "")
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 64, in _create_slot_var
    use_resource=_is_resource(primary))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1034, in get_variable
    use_resource=use_resource, custom_getter=custom_getter)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 933, in get_variable
    use_resource=use_resource, custom_getter=custom_getter)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 356, in get_variable
    validate_shape=validate_shape, use_resource=use_resource)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 341, in _true_getter
    use_resource=use_resource)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 671, in _get_single_variable
    "VarScope?" % name)
ValueError: Variable conv1/weights/ExponentialMovingAverage/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?

最佳答案

您可以在这里找到问题的答案: Issue 6220

您需要输入:
与 tf.variable_scope(tf.get_variable_scope())
在您的设备上运行的循环前面...

所以，这样做:

with tf.variable_scope(tf.get_variable_scope()):
    for i in xrange(FLAGS.num_gpus):
        with tf.device('/gpu:%d' % i):

链接中给出了解释...
引用如下:

When you do tf.get_variable_scope().reuse_variables() you set the current scope to reuse variables. If you call the optimizer in such scope, it's trying to reuse slot variables, which it cannot find, so it throws an error. If you put a scope around, the tf.get_variable_scope().reuse_variables() only affects that scope, so when you exit it, you're back in the non-reusing mode, the one you want.

Hope that helps, let me know if I should clarify more.

关于python - Tensorflow 多 GPU 示例错误 : Variable conv1/weights/ExponentialMovingAverage/does not exist，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/41986583/

python - Tensorflow 多 GPU 示例错误 : Variable conv1/weights/ExponentialMovingAverage/does not exist

0.12.head

上一篇：python - 我创建了一个类来在引导后返回置信区间，但我的置信区间看起来非常窄。我做错了什么？

下一篇：python - 使用 Pandas 按多列值对不同行的列表进行分组