machine-learning - 不确定tensorflow-gpu是否真的使用GPU

我目前正在尝试在 Udemy 深度学习类(class)的帮助下，在 tensorflow 后端上使用 Keras 运行卷积神经网络。然而它的运行速度非常慢，每个 epoch 大约需要 1,000 秒，而讲师的机器大约需要 60 秒(顺便说一句，他在 CPU 上运行它)。

CNN 是一个简单的图像识别网络，可以识别图像是猫还是狗。训练和测试数据总共包含 10,000 张图像，所有图像在我的 SSD 上总共占用 237 MB。

当我在 Python shell 中运行 CNN 时，我得到以下输出:

Epoch 1/25
2017-05-28 13:23:03.967337: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cp
u_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your m
achine and could speed up CPU computations.
2017-05-28 13:23:03.967574: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cp
u_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on your
machine and could speed up CPU computations.
2017-05-28 13:23:03.968153: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cp
u_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your
machine and could speed up CPU computations.
2017-05-28 13:23:03.968329: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cp
u_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on you
r machine and could speed up CPU computations.
2017-05-28 13:23:03.968576: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cp
u_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on you
r machine and could speed up CPU computations.
2017-05-28 13:23:04.505726: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runt
ime\gpu\gpu_device.cc:887] Found device 0 with properties:
name: GeForce GTX 1070
major: 6 minor: 1 memoryClockRate (GHz) 1.835
pciBusID 0000:28:00.0
Total memory: 8.00GiB
Free memory: 6.68GiB
2017-05-28 13:23:04.505944: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runt
ime\gpu\gpu_device.cc:908] DMA: 0
2017-05-28 13:23:04.506637: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runt
ime\gpu\gpu_device.cc:918] 0:   Y
2017-05-28 13:23:04.506895: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runt
ime\gpu\gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:2
8:00.0)
2684/8000 [=========>....................] - ETA: 845s - loss: 0.5011 - acc: 0.7427

这应该表明 Tensorflow 正在使用 GPU 进行计算。但是，当我检查 nvidia-smi 时，我得到以下输出:

 $ nvidia-smi
Sun May 28 13:25:46 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 376.53                 Driver Version: 376.53                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070   WDDM  | 0000:28:00.0      On |                  N/A |
|  0%   49C    P2    36W / 166W |   7240MiB /  8192MiB |      4%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      7676  C+G   ...ost_cw5n1h2txyewy\ShellExperienceHost.exe N/A      |
|    0      8580  C+G   Insufficient Permissions                     N/A      |
|    0      9704  C+G   ...x86)\Google\Chrome\Application\chrome.exe N/A      |
|    0     10532    C   ...\Anaconda3\envs\tensorflow-gpu\python.exe N/A      |
|    0     11384  C+G   Insufficient Permissions                     N/A      |
|    0     12896  C+G   C:\Windows\explorer.exe                      N/A      |
|    0     13868  C+G   Insufficient Permissions                     N/A      |
|    0     14068  C+G   Insufficient Permissions                     N/A      |
|    0     14568  C+G   Insufficient Permissions                     N/A      |
|    0     15260  C+G   ...osoftEdge_8wekyb3d8bbwe\MicrosoftEdge.exe N/A      |
|    0     16912  C+G   ...am Files (x86)\Dropbox\Client\Dropbox.exe N/A      |
|    0     18196  C+G   ...I\AppData\Local\hyper\app-1.3.3\Hyper.exe N/A      |
|    0     18228  C+G   ...oftEdge_8wekyb3d8bbwe\MicrosoftEdgeCP.exe N/A      |
|    0     20032  C+G   ...indows.Cortana_cw5n1h2txyewy\SearchUI.exe N/A      |
+-----------------------------------------------------------------------------+

请注意，每个进程都显示为同时使用 CPU 和 GPU(Type C+G)，而 Tensorflow 进程是唯一仅使用 CPU 的进程(Type C )。

对此有什么合理的解释吗？上周我一直在努力解决这个问题，但毫无进展。

我正在运行 Windows 10 Pro 计算机，配备华硕 Nvidia GTX 1070、24GB RAM 和 Intel Xeon X5670 CPU @2.93GHz。我使用以下命令创建了 Anaconda 环境:

conda create -n tensorflow-gpu python=3.5 anaconda
source activate tensorflow-gpu
conda install theano 
conda install mingw libpython 
pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/windows/gpu/tensorflow_gpu-1.1.0-cp35-cp35m-win_amd64.whl
pip install keras
conda update --all

我还安装了 CUDA 工具包和 CUDNN，并将它们各自的文件夹包含到我的 %PATH%

我们将不胜感激每一个帮助。

[编辑]

代码以防出现问题。

# Importing the Keras libraries and packages
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense

# Defining the CNN
classifier = Sequential()
# Convolution 1
classifier.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))
# Convolution 2
classifier.add(Conv2D(32, (3, 3), activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))
# Flatten + MLP
classifier.add(Flatten())
classifier.add(Dense(units = 128, activation = 'relu'))
classifier.add(Dense(units = 1, activation = 'sigmoid'))
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

# Fitting the CNN to the images
from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(rescale = 1./255,
                                   shear_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True)

test_datagen = ImageDataGenerator(rescale = 1./255)

training_set = train_datagen.flow_from_directory('dataset/training_set',
                                                 target_size = (64, 64),
                                                 batch_size = 32,
                                                 class_mode = 'binary')

test_set = test_datagen.flow_from_directory('dataset/test_set',
                                            target_size = (64, 64),
                                            batch_size = 32,
                                            class_mode = 'binary')

classifier.fit_generator(training_set,
                         steps_per_epoch = 8000,
                         epochs = 25,
                         validation_data = test_set,
                         validation_steps = 2000)

最佳答案

与你的机器没有任何关系，我在这个post讨论过这个问题在 Udemy 上。每个人似乎都有同样的问题，想知道为什么在教练的机器上要花 20 分钟。答案很简单:讲师发布的源代码与他在视频中提供的源代码不同!

检查文档中的steps_per_epoch

steps_per_epoch: Total number of steps (batches of samples) to yield from generator before declaring one epoch finished and starting the next epoch. It should typically be equal to the number of unique samples of your dataset divided by the batch size.

目前，对于单个纪元，您拍摄了 8000 * 32 = 256000 张图像。这是您在每个时期处理的样本数量。如果您认为您的数据集仅为 10000(增强后为 20k)，那么毫无意义。

如果您查看视频，您会看到讲师正在使用 samples_per_epoch，这意味着数据量减少了 32 倍。案件已破。

关于machine-learning - 不确定tensorflow-gpu是否真的使用GPU，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/44228639/

machine-learning - 不确定tensorflow-gpu是否真的使用GPU

上一篇：machine-learning - Keras 神经网络模型精度始终为零

下一篇：machine-learning - 为什么 tf.Session().run() 在这里不起作用？