Tensorflow:尝试分配 3.90GiB 时内存不足。调用者表明这不是失败

有一个问题我不明白。

Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.90GiB. 
The caller indicates that this is not a failure, 
but may mean that there could be performance gains if more memory is available.

这句话的意思是什么？

我已经阅读了源代码。但我能力差，看不懂。
GPU的显存大小为6GB，我使用tfprof分析的显存使用结果约为14GB。这超出了 GPU 的内存大小。 这句话是显示天气tensorflow分配CPU内存还是使用关于GPU内存使用的好算法？

The version of tensorflow that I use is 1.2.

GPU信息如下:

名称:GeForce GTX TITAN Z
主要:3 次要:5 内存时钟频率 (GHz) 0.8755
总内存:5.94GiB
可用内存:5.87GiB

我的代码:

#!/usr/bin/python3.4

import tensorflow as tf
import tensorflow.examples.tutorials.mnist.input_data as input_data
import os
os.environ["CUDA_VISIBLE_DEVICES"] = '0' 


mnist = input_data.read_data_sets("MNIST_data", one_hot=True)
sess = tf.InteractiveSession(config=tf.ConfigProto(log_device_placement=True))
#sess = tf.InteractiveSession()
def weight_variable(shape):
    init = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(init)

def bias_variable(shape):
    init = tf.constant(0.1, shape=shape)
    return tf.Variable(init)

def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

x = tf.placeholder(tf.float32, [None, 784])
y_ = tf.placeholder(tf.float32, [None, 10])
x_image = tf.reshape(x, [-1, 28, 28, 1])


W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)


W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)


W_f1 = weight_variable([7*7*64, 1024])
b_f1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_f1) + b_f1)


keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

W_f2 = weight_variable([1024, 10])
b_f2 = bias_variable([10])
y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_f2) + b_f2)

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv), reduction_indices=[1]))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))


test_images = tf.placeholder(tf.float32, [None, 784])
test_labels = tf.placeholder(tf.float32, [None, 10])


tf.global_variables_initializer().run()

run_metadata = tf.RunMetadata()


for i in range(100):
    batch = mnist.train.next_batch(10000)
    if (i%10 == 0):  
        train_accurancy = accuracy.eval(feed_dict={x: batch[0], y_: batch[1], keep_prob : 1.0})
        print("step %d, traning accurancy %g" % (i, train_accurancy))
    sess.run(train_step, feed_dict={x: batch[0], y_: batch[1], keep_prob : 0.5}, options=tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE), run_metadata=run_metadata)

tf.contrib.tfprof.model_analyzer.print_model_analysis(
    tf.get_default_graph(),
    run_meta=run_metadata,
    tfprof_options=tf.contrib.tfprof.model_analyzer.PRINT_ALL_TIMING_MEMORY)

test_images = mnist.test.images[0:300, :]
test_labels = mnist.test.labels[0:300, :]
print("test accuracy %g" % accuracy.eval({x: test_images, y_: test_labels, keep_prob: 1.0}))

警告:

2017-08-10 21:37:44.589635: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.90GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
2017-08-10 21:37:46.208897: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.61GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.

tfprof 的结果:

==================Model Analysis Report======================
_TFProfRoot (0B/14854.97MB, 0us/7.00ms)

最佳答案

您正在使用 GPU，并且您的批量大小为 1000，这对于 10 个类来说已经很多了!制作较小的批量大小，例如 10to20，并将范围扩大到 10e4 甚至 10e3。这个问题很好known 。如果您确实想使用 10000 作为批量大小，请告诉 TensorFlow 使用 CPU使用:

tf.device('/cpu:0')

关于Tensorflow:尝试分配 3.90GiB 时内存不足。调用者表明这不是失败，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45625691/

Tensorflow:尝试分配 3.90GiB 时内存不足。调用者表明这不是失败

上一篇：numpy - tensorflow batch_matmul如何工作？

下一篇：tensorflow - tensorflow 模型的设计模式