python - Tensorflow-gpu 1.15 不使用 GPU

我的系统安装了 Ubuntu20.04，因此为 Tensorflow 获取 CUDA 和 cudnn 的正确组合似乎有点棘手。我尝试了 CUDA11 但无法让 cudnn 工作，所以我通过 sudo apt install nvidia-cuda-toolkit 安装了 CUDA10.1和相应的 cudnn (7.6.5) ( some helpful answers )。现在，当我安装 Tensorflow-gpu 2 时，我可以轻松检查它是否使用 GPU:

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

给出了正确的输出 2 。但是我需要使用 Tensorflow-gpu-1.15。有了这个，我根据答案 in this SO post 尝试了以下操作:

import tensorflow as tf
with tf.device('/gpu:0'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
    c = tf.matmul(a, b)

with tf.Session() as sess:
    print (sess.run(c))

给出了以下输出:

2020-07-11 14:05:53.181428: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-11 14:05:53.183404: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:01:00.0
2020-07-11 14:05:53.183598: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-11 14:05:53.185222: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:02:00.0
2020-07-11 14:05:53.185548: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/cuda/include:/usr/lib/cuda/lib64:
2020-07-11 14:05:53.185790: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/cuda/include:/usr/lib/cuda/lib64:
2020-07-11 14:05:53.186015: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/cuda/include:/usr/lib/cuda/lib64:
2020-07-11 14:05:53.186237: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/cuda/include:/usr/lib/cuda/lib64:
2020-07-11 14:05:53.186459: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/cuda/include:/usr/lib/cuda/lib64:
2020-07-11 14:05:53.186578: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/cuda/include:/usr/lib/cuda/lib64:
2020-07-11 14:05:53.186594: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-07-11 14:05:53.186601: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-07-11 14:05:53.187652: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-11 14:05:53.187669: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      
Traceback (most recent call last):
  File "/home/mo/anaconda3/envs/lf/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
  File "/home/mo/anaconda3/envs/lf/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1348, in _run_fn
self._extend_graph()
  File "/home/mo/anaconda3/envs/lf/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1388, in _extend_graph
    tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation MatMul: {{node MatMul}} was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:1 ]. Make sure the device specification refers to a valid device.
 [[MatMul]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/home/mo/anaconda3/envs/lf/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
  File "/home/mo/anaconda3/envs/lf/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
  File "/home/mo/anaconda3/envs/lf/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
run_metadata)
  File "/home/mo/anaconda3/envs/lf/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation MatMul: node MatMul (defined at /home/mo/anaconda3/envs/lf/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748)  was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:1 ]. Make sure the device specification refers to a valid device.
 [[MatMul]]

我无法理解这个问题，我是否需要不同的CUDA版本(10.0可能是因为缺少库)，或者我应该更改tf.device中的设备名称，我不确定CUDA10.0是否可以安装在ubuntu20上，所以可以安装旧的ubuntu版本吗？

最佳答案

我在让 CUDA 工作时遇到了类似的问题。我的解决方案是降级到 Ubuntu 18.04，并确保我拥有测试构建配置中列出的 gcc、CUDA 和 Tensorflow 的正确组合:

https://www.tensorflow.org/install/source#tested_build_configurations

我的解决方案的原始内容记录在这个 StackOverflow 问题中:

ImportError: libcublas.so.10.0: cannot open shared object file: No such file or director

关于python - Tensorflow-gpu 1.15 不使用 GPU，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/62847255/

python - Tensorflow-gpu 1.15 不使用 GPU

上一篇：webpack - 如何从rails 6和Webpack中的布局添加javacript_pack_tag中的子目录？

下一篇：python - 使用 pandas 根据查找表为列分配值