python - TensorFlow-GPU 导致 python 崩溃

我在使用 tensorflow-gpu 1.6.0 时遇到了一些问题。

我正在 coursera 上做“机器学习中的贝叶斯方法”类(class)的期末作业。

https://www.coursera.org/learn/bayesian-methods-in-machine-learning

当我使用 tensorflow-gpu (pip install tensorflow-gpu) 在 GPU 上运行代码时，python 崩溃，但是如果我使用标准 tensorflow (pip) 在 CPU 上运行相同的代码isntall tensorflow)，代码运行速度很快，没有错误或崩溃。显然我在安装标准版本之前卸载了 gpu 版本，反之亦然。

关于 python 崩溃，调试器显示此消息:

Unhandled exception at 0x00007FFDAB4DB79E (ucrtbase.dll) in python.exe

这是起始代码:

import numpy as np
import matplotlib.pyplot as plt
from IPython.display import clear_output
import tensorflow as tf
import GPy
import GPyOpt
import keras
from keras.layers import Input, Dense, Lambda, InputLayer, concatenate, Activation, Flatten, Reshape
from keras.layers.normalization import BatchNormalization
from keras.layers.convolutional import Conv2D, Deconv2D
from keras.losses import MSE
from keras.models import Model, Sequential
from keras import backend as K
from keras import metrics
from keras.datasets import mnist
from keras.utils import np_utils
from tensorflow.python.framework import ops
from tensorflow.python.framework import dtypes
import utils
import os
%matplotlib inline

sess = tf.InteractiveSession()
K.set_session(sess)

latent_size = 8

vae, encoder, decoder = utils.create_vae(batch_size=128, latent=latent_size)
sess.run(tf.global_variables_initializer())
vae.load_weights('CelebA_VAE_small_8.h5')

K.set_learning_phase(False)

latent_placeholder = tf.placeholder(tf.float32, (1, latent_size))
decode = decoder(latent_placeholder)

此代码在 GPU 而非 CPU 上执行时会导致 python 崩溃:

plt.figure(figsize=(10, 10))
for i in range(25):
    plt.subplot(5, 5, i+1)
    image = sess.run(decode, feed_dict={latent_placeholder: np.random.normal([0]*latent_size,[1]*latent_size)[:, np.newaxis].T})[0]### YOUR CODE HERE
    plt.imshow(np.clip(image, 0, 1))
    plt.axis('off')

附加信息:

python 版本 3.6.4
tensorflow 1.6.0
tensorflow-GPU 1.6.0
CUDA 9.0 的 cuDNN 7.1.1
带有补丁 1 和 2 的 CUDA 9.0
GPU 1080ti 驱动程序 391.01

你可以在wetransfer上找到python笔记本和权重: https://wetransfer.com/downloads/59b9011823d38c204b5ef5a2b58f5e8e20180311201808/32c900

最佳答案

我发现了问题。 cuDNN 7.1.1 还不适用于 tensorflow-gpu。我将 cuDNN 降级到 7.0.5，现在代码按预期工作。

如果你有像我这样的问题，你必须降级 cuDNN!

关于python - TensorFlow-GPU 导致 python 崩溃，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49224425/

python - TensorFlow-GPU 导致 python 崩溃

上一篇：python - Pandas 合并 : combining column values & merging new column values to the same row

下一篇：python - 具有可变默认值的函数