machine-learning - 默认情况下,TensorFlow 是否使用机器中所有可用的 GPU?

标签 machine-learning computer-vision gpu tensorflow

我的机器中有 3 个 GTX Titan GPU。我使用 cifar10_train.py 运行 Cifar10 中提供的示例并得到以下输出:

I tensorflow/core/common_runtime/gpu/gpu_init.cc:60] cannot enable peer access from device ordinal 0 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_init.cc:60] cannot enable peer access from device ordinal 1 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:127] DMA: 0 1 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:137] 0:   Y N 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:137] 1:   N Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:694] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN, pci bus id: 0000:03:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:694] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX TITAN, pci bus id: 0000:84:00.0)

在我看来,TensorFlow 正在尝试在两个设备(gpu0 和 gpu1)上初始化自身。

我的问题是为什么它只在两台设备上执行此操作,有什么方法可以防止这种情况发生? (我只希望它像有一个 GPU 一样运行)

最佳答案

参见:Using GPUs

手动设备放置

如果您希望特定操作在您选择的设备上运行,而不是在自动为您选择的设备上运行,您可以使用 tf.device 创 build 备上下文,以便所有操作在该上下文中将具有相同的设备分配。

# Creates a graph.
with tf.device('/cpu:0'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

您将看到现在 a 和 b 已分配给 cpu:0。由于没有为 MatMul 操作明确指定设备,TensorFlow 运行时将根据操作和可用设备(本例中为 gpu:0)选择一个设备,并根据需要在设备之间自动复制张量。

Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K40c, pci bus
id: 0000:05:00.0
b: /job:localhost/replica:0/task:0/cpu:0
a: /job:localhost/replica:0/task:0/cpu:0
MatMul: /job:localhost/replica:0/task:0/gpu:0
[[ 22.  28.]
 [ 49.  64.]]
<小时/>

之前的回答 2。

参见:Using GPUs

在多 GPU 系统上使用单个 GPU

如果您的系统中有多个 GPU,则默认情况下会选择 ID 最低的 GPU。如果您想在不同的 GPU 上运行,则需要明确指定首选项:

# Creates a graph.
with tf.device('/gpu:2'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
  c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print sess.run(c)
<小时/>

之前的回答 1。

来自CUDA_VISIBLE_DEVICES – Masking GPUs

Does your CUDA application need to target a specific GPU? If you are writing GPU enabled code, you would typically use a device query to select the desired GPUs. However, a quick and easy solution for testing is to use the environment variable CUDA_VISIBLE_DEVICES to restrict the devices that your CUDA application sees. This can be useful if you are attempting to share resources on a node or you want your GPU enabled executable to target a specific GPU.

Environment Variable Syntax

Results

CUDA_VISIBLE_DEVICES=1 Only device 1 will be seen CUDA_VISIBLE_DEVICES=0,1 Devices 0 and 1 will be visible CUDA_VISIBLE_DEVICES=”0,1” Same as above, quotation marks are optional CUDA_VISIBLE_DEVICES=0,2,3 Devices 0, 2, 3 will be visible; device 1 is masked

CUDA will enumerate the visible devices starting at zero. In the last case, devices 0, 2, 3 will appear as devices 0, 1, 2. If you change the order of the string to “2,3,0”, devices 2,3,0 will be enumerated as 0,1,2 respectively. If CUDA_VISIBLE_DEVICES is set to a device that does not exist, all devices will be masked. You can specify a mix of valid and invalid device numbers. All devices before the invalid value will be enumerated, while all devices after the invalid value will be masked.

To determine the device ID for the available hardware in your system, you can run NVIDIA’s deviceQuery executable included in the CUDA SDK. Happy programming!

Chris Mason

关于machine-learning - 默认情况下,TensorFlow 是否使用机器中所有可用的 GPU?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34834714/

相关文章:

matlab - 气缸全景

c# - 使用 OpenCV/EmguCV 提高人脸检测性能

opengl - 更详细地了解缓冲区交换

python - 为什么我的 VotingClassifier 准确性低于我的个人分类器?

numpy - python中二元分类的ROC曲线

computer-vision - tf.nn.max_pool的ksize参数用于什么?

opencv - opencv gpu中的错误处理

python - 在 sklearn 中使用留一法交叉验证的 ROC 曲线

machine-learning - 有人会如何创建一种机器学习算法来从书籍/小说中提取说话者?

c++ - cuda数组排序推力,内存不足