python - Tensorflow-gpu 1.15 不使用 GPU

标签 python tensorflow ubuntu-20.04

我的系统安装了 Ubuntu20.04,因此为 Tensorflow 获取 CUDA 和 cudnn 的正确组合似乎有点棘手。我尝试了 CUDA11 但无法让 cudnn 工作,所以我通过 sudo apt install nvidia-cuda-toolkit 安装了 CUDA10.1和相应的 cudnn (7.6.5) ( some helpful answers )。现在,当我安装 Tensorflow-gpu 2 时,我可以轻松检查它是否使用 GPU:

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU'))) 

给出了正确的输出 2 。但是我需要使用 Tensorflow-gpu-1.15。有了这个,我根据答案 in this SO post 尝试了以下操作:

import tensorflow as tf
with tf.device('/gpu:0'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
    c = tf.matmul(a, b)

with tf.Session() as sess:
    print (


2020-07-11 14:05:53.181428: I tensorflow/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-11 14:05:53.183404: I tensorflow/core/common_runtime/gpu/] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:01:00.0
2020-07-11 14:05:53.183598: I tensorflow/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-11 14:05:53.185222: I tensorflow/core/common_runtime/gpu/] Found device 1 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:02:00.0
2020-07-11 14:05:53.185548: W tensorflow/stream_executor/platform/default/] Could not load dynamic library ''; dlerror: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/cuda/include:/usr/lib/cuda/lib64:
2020-07-11 14:05:53.185790: W tensorflow/stream_executor/platform/default/] Could not load dynamic library ''; dlerror: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/cuda/include:/usr/lib/cuda/lib64:
2020-07-11 14:05:53.186015: W tensorflow/stream_executor/platform/default/] Could not load dynamic library ''; dlerror: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/cuda/include:/usr/lib/cuda/lib64:
2020-07-11 14:05:53.186237: W tensorflow/stream_executor/platform/default/] Could not load dynamic library ''; dlerror: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/cuda/include:/usr/lib/cuda/lib64:
2020-07-11 14:05:53.186459: W tensorflow/stream_executor/platform/default/] Could not load dynamic library ''; dlerror: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/cuda/include:/usr/lib/cuda/lib64:
2020-07-11 14:05:53.186578: W tensorflow/stream_executor/platform/default/] Could not load dynamic library ''; dlerror: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/cuda/include:/usr/lib/cuda/lib64:
2020-07-11 14:05:53.186594: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2020-07-11 14:05:53.186601: W tensorflow/core/common_runtime/gpu/] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-07-11 14:05:53.187652: I tensorflow/core/common_runtime/gpu/] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-11 14:05:53.187669: I tensorflow/core/common_runtime/gpu/]      
Traceback (most recent call last):
  File "/home/mo/anaconda3/envs/lf/lib/python3.6/site-packages/tensorflow_core/python/client/", line 1365, in _do_call
return fn(*args)
  File "/home/mo/anaconda3/envs/lf/lib/python3.6/site-packages/tensorflow_core/python/client/", line 1348, in _run_fn
  File "/home/mo/anaconda3/envs/lf/lib/python3.6/site-packages/tensorflow_core/python/client/", line 1388, in _extend_graph
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation MatMul: {{node MatMul}} was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:1 ]. Make sure the device specification refers to a valid device.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/home/mo/anaconda3/envs/lf/lib/python3.6/site-packages/tensorflow_core/python/client/", line 956, in run
  File "/home/mo/anaconda3/envs/lf/lib/python3.6/site-packages/tensorflow_core/python/client/", line 1180, in _run
feed_dict_tensor, options, run_metadata)
  File "/home/mo/anaconda3/envs/lf/lib/python3.6/site-packages/tensorflow_core/python/client/", line 1359, in _do_run
  File "/home/mo/anaconda3/envs/lf/lib/python3.6/site-packages/tensorflow_core/python/client/", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation MatMul: node MatMul (defined at /home/mo/anaconda3/envs/lf/lib/python3.6/site-packages/tensorflow_core/python/framework/  was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:1 ]. Make sure the device specification refers to a valid device.



我在让 CUDA 工作时遇到了类似的问题。我的解决方案是降级到 Ubuntu 18.04,并确保我拥有测试构建配置中列出的 gcc、CUDA 和 Tensorflow 的正确组合:

我的解决方案的原始内容记录在这个 StackOverflow 问题中:

关于python - Tensorflow-gpu 1.15 不使用 GPU,我们在Stack Overflow上找到一个类似的问题:


python - 解析帝国时代游戏记录文件(.mgx)

python - 如何加载经过训练的 tensorflow 模型

postgresql - 从 19.10 升级到 20.04 - 错误 :2 http://apt. eoan-pgdg

bash - 如何为 gcloud 获取 shell 命令完成(自动完成)?

python - 为 pandas 数据框列向量化 HumanName 库

python - 将 xpath 表达式传递给 xpath 结果

python - Tensorflow slim 训练和验证初始模型

angular - wsl2 中的 Angular-CLI 比默认的 powershell 慢得多是否正常?

python - OrderedDict 不保留顺序

Python - 类型错误 : 'float' object cannot be interpreted as an integer