python - Tensorflow 1.8 GPU版本似乎在windows上不使用GPU

我正在尝试使用tensorflow中的Alexnet CNN，我可以训练模型而没有任何错误消息，并且可以访问张量板，并且我可以测试训练后的模型而没有错误。

唯一的问题是，在训练时，GPU 使用率大多保持在 0%，有时会波动到 25%，但很少见。然而我所有的 CPU 都在疯狂工作，超过 90%。所以我假设它使用 CPU 而不是 GPU。

这是我的设置

Windows 8.1 x64
GPU 1070 driver version 3.88
tensorflow-gpu 1.8.0
CUDA toolkit v9.0
cuDNN version 7

我可以在 python 中导入tensorflow，没有错误，我运行了一些测试来查看它是否安装正确，

测试 1

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())  

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 625346735515728619
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 6911164212
locality {
  bus_id: 1
  links {
  }
}
incarnation: 15764160474642097170
physical_device_desc: "device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1"
]

测试 2

import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))

b'Hello, TensorFlow!'

测试3

import ctypes
import imp
import sys

def main():
  try:
    import tensorflow as tf
    print("TensorFlow successfully installed.")
    if tf.test.is_built_with_cuda():
      print("The installed version of TensorFlow includes GPU support.")
    else:
      print("The installed version of TensorFlow does not include GPU support.")
    sys.exit(0)
  except ImportError:
    print("ERROR: Failed to import the TensorFlow module.")

  candidate_explanation = False

  python_version = sys.version_info.major, sys.version_info.minor
  print("\n- Python version is %d.%d." % python_version)
  if not (python_version == (3, 5) or python_version == (3, 6)):
    candidate_explanation = True
    print("- The official distribution of TensorFlow for Windows requires "
          "Python version 3.5 or 3.6.")

  try:
    _, pathname, _ = imp.find_module("tensorflow")
    print("\n- TensorFlow is installed at: %s" % pathname)
  except ImportError:
    candidate_explanation = False
    print("""
- No module named TensorFlow is installed in this Python environment. You may
  install it using the command `pip install tensorflow`.""")

  try:
    msvcp140 = ctypes.WinDLL("msvcp140.dll")
  except OSError:
    candidate_explanation = True
    print("""
- Could not load 'msvcp140.dll'. TensorFlow requires that this DLL be
  installed in a directory that is named in your %PATH% environment
  variable. You may install this DLL by downloading Microsoft Visual
  C++ 2015 Redistributable Update 3 from this URL:
  https://www.microsoft.com/en-us/download/details.aspx?id=53587""")

  try:
    cudart64_80 = ctypes.WinDLL("cudart64_80.dll")
  except OSError:
    candidate_explanation = True
    print("""
- Could not load 'cudart64_80.dll'. The GPU version of TensorFlow
  requires that this DLL be installed in a directory that is named in
  your %PATH% environment variable. Download and install CUDA 8.0 from
  this URL: https://developer.nvidia.com/cuda-toolkit""")

  try:
    nvcuda = ctypes.WinDLL("nvcuda.dll")
  except OSError:
    candidate_explanation = True
    print("""
- Could not load 'nvcuda.dll'. The GPU version of TensorFlow requires that
  this DLL be installed in a directory that is named in your %PATH%
  environment variable. Typically it is installed in 'C:\Windows\System32'.
  If it is not present, ensure that you have a CUDA-capable GPU with the
  correct driver installed.""")

  cudnn5_found = False
  try:
    cudnn5 = ctypes.WinDLL("cudnn64_5.dll")
    cudnn5_found = True
  except OSError:
    candidate_explanation = True
    print("""
- Could not load 'cudnn64_5.dll'. The GPU version of TensorFlow
  requires that this DLL be installed in a directory that is named in
  your %PATH% environment variable. Note that installing cuDNN is a
  separate step from installing CUDA, and it is often found in a
  different directory from the CUDA DLLs. You may install the
  necessary DLL by downloading cuDNN 5.1 from this URL:
  https://developer.nvidia.com/cudnn""")

  cudnn6_found = False
  try:
    cudnn = ctypes.WinDLL("cudnn64_6.dll")
    cudnn6_found = True
  except OSError:
    candidate_explanation = True

  if not cudnn5_found or not cudnn6_found:
    print()
    if not cudnn5_found and not cudnn6_found:
      print("- Could not find cuDNN.")
    elif not cudnn5_found:
      print("- Could not find cuDNN 5.1.")
    else:
      print("- Could not find cuDNN 6.")
      print("""
  The GPU version of TensorFlow requires that the correct cuDNN DLL be installed
  in a directory that is named in your %PATH% environment variable. Note that
  installing cuDNN is a separate step from installing CUDA, and it is often
  found in a different directory from the CUDA DLLs. The correct version of
  cuDNN depends on your version of TensorFlow:

  * TensorFlow 1.2.1 or earlier requires cuDNN 5.1. ('cudnn64_5.dll')
  * TensorFlow 1.3 or later requires cuDNN 6. ('cudnn64_6.dll')

  You may install the necessary DLL by downloading cuDNN from this URL:
  https://developer.nvidia.com/cudnn""")

  if not candidate_explanation:
    print("""
- All required DLLs appear to be present. Please open an issue on the
  TensorFlow GitHub page: https://github.com/tensorflow/tensorflow/issues""")

  sys.exit(-1)

if __name__ == "__main__":
  main()

我明白了

TensorFlow successfully installed.
The installed version of TensorFlow includes GPU support.

Python 训练

当开始用 python 训练时，我收到一些警告，但是当我搜索它时，我似乎可以忽略这些警告，

curses is not supported on this machine (please install/reinstall curses for an optimal experience)
WARNING:tensorflow:From C:\Users\Jay\AppData\Local\Programs\Python\Python36\lib\site-packages\tflearn\initializations.py:119: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior.
WARNING:tensorflow:From C:\Users\Jay\AppData\Local\Programs\Python\Python36\lib\site-packages\tflearn\objectives.py:66: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
---------------------------------
Run id: pygta5-car-fast-0.001-alexnetv2-10-epochs-300K-data.model
Log directory: log/
---------------------------------
Training samples: 1500
Validation samples: 500
--
Training Step: 1  | time: 2.863s
[2K
| Momentum | epoch: 001 | loss: 0.00000 - acc: 0.0000 -- iter: 0064/1500
[A[ATraining Step: 2  | total loss: [1m[32m1.73151[0m[0m | time: 4.523s

cmd 培训

当我在 cmd 中训练时，我收到的消息略有不同，

curses is not supported on this machine (please install/reinstall curses for an optimal experience)
WARNING:tensorflow:From C:\Users\Jay\AppData\Local\Programs\Python\Python36\lib\site-packages\tflearn\initializations.py:119: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior.
WARNING:tensorflow:From C:\Users\Jay\AppData\Local\Programs\Python\Python36\lib\site-packages\tflearn\objectives.py:66: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
2018-05-13 19:13:07.272665: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018-05-13 19:13:07.749663: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.7845
pciBusID: 0000:01:00.0
totalMemory: 8.00GiB freeMemory: 6.77GiB
2018-05-13 19:13:07.766329: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-13 19:13:08.295258: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-13 19:13:08.310539: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:929]      0
2018-05-13 19:13:08.317846: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:942] 0:   N
2018-05-13 19:13:08.325655: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6540 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-05-13 19:13:09.481654: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-13 19:13:09.492392: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-13 19:13:09.507539: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:929]      0
2018-05-13 19:13:09.514839: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:942] 0:   N
2018-05-13 19:13:09.522600: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6540 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
---------------------------------
Run id: pygta5-car-fast-0.001-alexnetv2-10-epochs-300K-data.model
Log directory: log/
---------------------------------
Training samples: 1500
Validation samples: 500
--
Training Step: 1  | time: 2.879s
| Momentum | epoch: 001 | loss: 0.00000 - acc: 0.0000 -- iter: 0064/1500
←[A←[ATraining Step: 2  | total loss: ←[1m←[32m1.60460←[0m←[0m | time: 4.542s

训练时的表现

训练时，CPU 几乎总是超过 90%，而 GPU 使用率在 0~25% 左右

感谢您查看这篇长文，我似乎无法从这里找到问题所在。任何帮助将不胜感激，

最佳答案

您使用的是自己的模型还是现有的代码？如果您使用自己的实现，则 GPU 利用率低可能是由于您的特定实现造成的。通常，数据管道是造成这种情况的罪魁祸首。查看 TF ( https://www.tensorflow.org/performance/performance_guide ) 的性能指南以及如何使用 tf.data.Dataset ( https://www.tensorflow.org/programmers_guide/datasets ) 高效导入数据。

如果不是数据管道，那么您的某些代码可能会在 CPU 上执行，这会导致大量从 GPU 复制到 CPU，反之亦然。也许您使用 Numpy 或类似的东西执行一些仅在 CPU 上运行的操作？!此外，还适用其他常见准则，例如尽量不要使用占位符和feed_dict等...

总的来说，如果我们可以看到您的代码，或者如果您使用现有代码，我们知道您正在使用哪种模型，将会更有帮助。如果不查看您的代码，很难知道问题出在哪里。尝试直接从 Tensorflow 下载 MNIST-CNN 示例并在计算机上运行。它的 GPU-Util 应该至少为 80%。如果不是这种情况，那么 mos

关于python - Tensorflow 1.8 GPU版本似乎在windows上不使用GPU，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/50321452/

python - Tensorflow 1.8 GPU版本似乎在windows上不使用GPU

上一篇：python - 代码在不需要时自动实现变量

下一篇：python - 沿同一索引合并列