python - Tensorflow 多 GPU 示例错误 : Variable conv1/weights/ExponentialMovingAverage/does not exist

标签 python tensorflow multi-gpu

我正在运行本教程中提到的代码:https://www.tensorflow.org/tutorials/deep_cnn/

我从这里下载了代码:https://github.com/tensorflow/models/tree/master/tutorials/image/cifar10/

我正在 AWS 的 ubuntu 14.04 上的 g2.4xlarge 机器上运行代码。单 GPU 示例运行良好,没有任何错误。

有人可以帮忙解决这个问题吗?我正在运行0.12版本。


ubuntu@ip-xxx-xx-xx-xx:~/pythonworkspace/tensorflowdev/models-master/tutorials/image/cifar10$ python -c '导入tensorflow为tf;打印(tf。版本)'

0.12.head

ubuntu@ip-xxx-xx-xx-xx:~/pythonworkspace/tensorflowdev/models-master/tutorials/image/cifar10$ python cifar10_multi_gpu_train.py --num_gpus=2

>> Downloading cifar-10-binary.tar.gz 100.0%
Successfully downloaded cifar-10-binary.tar.gz 170052171 bytes.
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
WARNING:tensorflow:From /home/ubuntu/pythonworkspace/tensorflowdev/models-master/tutorials/image/cifar10/cifar10_input.py:135: image_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.image. Note that tf.summary.image uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, the max_images argument was renamed to max_outputs.
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
WARNING:tensorflow:From /home/ubuntu/pythonworkspace/tensorflowdev/models-master/tutorials/image/cifar10/cifar10_input.py:135: image_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.image. Note that tf.summary.image uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, the max_images argument was renamed to max_outputs.
Traceback (most recent call last):
  File "cifar10_multi_gpu_train.py", line 273, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "cifar10_multi_gpu_train.py", line 269, in main
    train()
  File "cifar10_multi_gpu_train.py", line 210, in train
    variables_averages_op = variable_averages.apply(tf.trainable_variables())
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/moving_averages.py", line 373, in apply
    colocate_with_primary=True)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 110, in create_slot
    return _create_slot_var(primary, val, "")
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 64, in _create_slot_var
    use_resource=_is_resource(primary))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1034, in get_variable
    use_resource=use_resource, custom_getter=custom_getter)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 933, in get_variable
    use_resource=use_resource, custom_getter=custom_getter)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 356, in get_variable
    validate_shape=validate_shape, use_resource=use_resource)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 341, in _true_getter
    use_resource=use_resource)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 671, in _get_single_variable
    "VarScope?" % name)
ValueError: Variable conv1/weights/ExponentialMovingAverage/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?

最佳答案

您可以在这里找到问题的答案: Issue 6220

您需要输入:
与 tf.variable_scope(tf.get_variable_scope())
在您的设备上运行的循环前面...

所以,这样做:

with tf.variable_scope(tf.get_variable_scope()):
    for i in xrange(FLAGS.num_gpus):
        with tf.device('/gpu:%d' % i):

链接中给出了解释...
引用如下:

When you do tf.get_variable_scope().reuse_variables() you set the current scope to reuse variables. If you call the optimizer in such scope, it's trying to reuse slot variables, which it cannot find, so it throws an error. If you put a scope around, the tf.get_variable_scope().reuse_variables() only affects that scope, so when you exit it, you're back in the non-reusing mode, the one you want.

Hope that helps, let me know if I should clarify more.

关于python - Tensorflow 多 GPU 示例错误 : Variable conv1/weights/ExponentialMovingAverage/does not exist,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41986583/

相关文章:

tensorflow - Google Cloud ML 引擎中的分布式 Tensorflow 设备放置

tensorflow - 这是 tensorflow 文本摘要的正确指南吗?

顶级模块代码中的 Python @ (at) 前缀 - 它代表什么?

python - sqlalchemy.exc.DataError : (psycopg2. DataError) 整数超出范围

python - 测试 Tornado 应用程序的 4xx 状态代码

python - 从 BERT 获取嵌入查找结果

tensorflow - 在 CPU 上运行 Tensorflow 比在 GPU 上运行速度更快

pytorch - 如何在pytorch框架的推理过程中使用多GPU

c++ - 在多 GPU 上启动异步内存复制操作

python - 为什么我的 exe 文件不遵循脚本中的指定路径?