amazon-web-services - TensorFlow 不使用 GPU

标签 amazon-web-services tensorflow gpu

我使用 AMI 启动了 AWS 深度学习机器。现在我尝试从 TensorFlow 运行简单的入门示例

# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

但我的机器似乎没有使用我的 GPU。

MatMul_2: (MatMul): /job:localhost/replica:0/task:0/cpu:0 2017-07-09 00:51:03.830238: I tensorflow/core/common_runtime/simple_placer.cc:847] MatMul_2: (MatMul)/job:localhost/replica:0/task:0/cpu:0 MatMul_1: (MatMul): /job:localhost/replica:0/task:0/cpu:0 2017-07-09 00:51:03.830259: I tensorflow/core/common_runtime/simple_placer.cc:847] MatMul_1: (MatMul)/job:localhost/replica:0/task:0/cpu:0 MatMul: (MatMul): /job:localhost/replica:0/task:0/cpu:0 2017-07-09 00:51:03.830271: I tensorflow/core/common_runtime/simple_placer.cc:847] MatMul: (MatMul)/job:localhost/replica:0/task:0/cpu:0 b_2: (Const): /job:localhost/replica:0/task:0/cpu:0 2017-07-09 00:51:03.830283: I tensorflow/core/common_runtime/simple_placer.cc:847] b_2: (Const)/job:localhost/replica:0/task:0/cpu:0 a_2: (Const): /job:localhost/replica:0/task:0/cpu:0 2017-07-09 00:51:03.830312: I tensorflow/core/common_runtime/simple_placer.cc:847] a_2: (Const)/job:localhost/replica:0/task:0/cpu:0 b_1: (Const): /job:localhost/replica:0/task:0/cpu:0 2017-07-09 00:51:03.830324: I tensorflow/core/common_runtime/simple_placer.cc:847] b_1: (Const)/job:localhost/replica:0/task:0/cpu:0 a_1: (Const): /job:localhost/replica:0/task:0/cpu:0 2017-07-09 00:51:03.830337: I tensorflow/core/common_runtime/simple_placer.cc:847] a_1: (Const)/job:localhost/replica:0/task:0/cpu:0 b: (Const): /job:localhost/replica:0/task:0/cpu:0 2017-07-09 00:51:03.830348: I tensorflow/core/common_runtime/simple_placer.cc:847] b: (Const)/job:localhost/replica:0/task:0/cpu:0 a: (Const): /job:localhost/replica:0/task:0/cpu:0 2017-07-09 00:51:03.830358: I tensorflow/core/common_runtime/simple_placer.cc:847] a: (Const)/job:localhost/replica:0/task:0/cpu:0

如果我尝试使用 with tf.device('/gpu:0'): 手动指定 GPU,则会收到以下错误:

InvalidArgumentError: Cannot assign a device for operation 'MatMul_3': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]. Make sure the device specification refers to a valid device. [[Node: MatMul_3 = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/device:GPU:0"](a_3, b_3)]]

我对 AMI 所做的唯一更改是将 TensorFlow 更新到最新版本

这是我运行watch nvidia-smi时看到的内容

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57                 Driver Version: 367.57                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 0000:00:1E.0     Off |                    0 |
| N/A   44C    P8    27W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

最佳答案

1.检查您的实例,是否选择GPU?
使用“watch nvidia-smi”查看 GPU 信息。

2.检查您的AMI和tensorflow版本,可能不支持GPU或配置错误。

我使用此 AMI:深度学习 AMI Amazon Linux (ami-296e7850)。

关于amazon-web-services - TensorFlow 不使用 GPU,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44992011/

相关文章:

amazon-web-services - X-Amz-Expires 是否是对 AWS 的请求所需的 header /参数?

python - 如何重新训练自定义图像的 mobilenet 模型

python - 无法使用pip或anaconda安装tensorflow

parallel-processing - GPU MHZ 利用率

Vulkan WaW 危害和内存屏障

java - 如何使用 IAM 用户 ID 和密码登录 aws?

mysql - 与 mysql 和多个并发连接的写入后读取一致性

python - TensorFlow PTB 教程中需要 m.initial_state.eval()

python - CPU 和 GPU 生成的结果不匹配

mysql - RDS Mysql数据库主从方式