我正在运行本教程中提到的代码:https://www.tensorflow.org/tutorials/deep_cnn/
我从这里下载了代码:https://github.com/tensorflow/models/tree/master/tutorials/image/cifar10/
我正在 AWS 的 ubuntu 14.04 上的 g2.4xlarge 机器上运行代码。单 GPU 示例运行良好,没有任何错误。
有人可以帮忙解决这个问题吗?我正在运行0.12版本。
ubuntu@ip-xxx-xx-xx-xx:~/pythonworkspace/tensorflowdev/models-master/tutorials/image/cifar10$ python -c '导入tensorflow为tf;打印(tf。版本)'
0.12.head
ubuntu@ip-xxx-xx-xx-xx:~/pythonworkspace/tensorflowdev/models-master/tutorials/image/cifar10$ python cifar10_multi_gpu_train.py --num_gpus=2
>> Downloading cifar-10-binary.tar.gz 100.0%
Successfully downloaded cifar-10-binary.tar.gz 170052171 bytes.
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
WARNING:tensorflow:From /home/ubuntu/pythonworkspace/tensorflowdev/models-master/tutorials/image/cifar10/cifar10_input.py:135: image_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.image. Note that tf.summary.image uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, the max_images argument was renamed to max_outputs.
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
WARNING:tensorflow:From /home/ubuntu/pythonworkspace/tensorflowdev/models-master/tutorials/image/cifar10/cifar10_input.py:135: image_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.image. Note that tf.summary.image uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, the max_images argument was renamed to max_outputs.
Traceback (most recent call last):
File "cifar10_multi_gpu_train.py", line 273, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "cifar10_multi_gpu_train.py", line 269, in main
train()
File "cifar10_multi_gpu_train.py", line 210, in train
variables_averages_op = variable_averages.apply(tf.trainable_variables())
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/moving_averages.py", line 373, in apply
colocate_with_primary=True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 110, in create_slot
return _create_slot_var(primary, val, "")
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 64, in _create_slot_var
use_resource=_is_resource(primary))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1034, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 933, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 356, in get_variable
validate_shape=validate_shape, use_resource=use_resource)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 341, in _true_getter
use_resource=use_resource)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 671, in _get_single_variable
"VarScope?" % name)
ValueError: Variable conv1/weights/ExponentialMovingAverage/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?
最佳答案
您可以在这里找到问题的答案: Issue 6220
您需要输入:
与 tf.variable_scope(tf.get_variable_scope())
在您的设备上运行的循环前面...
所以,这样做:
with tf.variable_scope(tf.get_variable_scope()):
for i in xrange(FLAGS.num_gpus):
with tf.device('/gpu:%d' % i):
链接中给出了解释...
引用如下:
When you do tf.get_variable_scope().reuse_variables() you set the current scope to reuse variables. If you call the optimizer in such scope, it's trying to reuse slot variables, which it cannot find, so it throws an error. If you put a scope around, the tf.get_variable_scope().reuse_variables() only affects that scope, so when you exit it, you're back in the non-reusing mode, the one you want.
Hope that helps, let me know if I should clarify more.
关于python - Tensorflow 多 GPU 示例错误 : Variable conv1/weights/ExponentialMovingAverage/does not exist,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41986583/