python - Theano 在 Linux 中使用 cuDNN 崩溃

标签 python linux theano theano-cuda

我是运行 Scientific Linux 版本 6.6 (Carbon) 的集群计算机上的非根用户。

在使用 CUDA 7.5 和 cuDNN 5 的 GPU 上运行代码时,我遇到了一些 theano 崩溃。我使用的是 Python 2.7、Theano 0.9、Keras 1.0.7 和 Lasange 0.1。

仅当我在启用了 cuDNN 的 GPU 节点上运行程序时,才会发生以下崩溃。在禁用 cuDNN 的 CPU 和 GPU 上,代码可以毫无问题地完成。

Traceback (most recent call last):
  File "runner.py", line 306, in <module>
    main()
  File "runner.py", line 241, in main
    queries_exp = __import__(args.exp_model).queries_exp
  File "/mnt/nfs2/inf/tjb32/workspace/CNN_EL/nlp-entity-convnet/exp_multi_conv_cosim.py", line 923, in <module>
    queries_exp = EntityVectorLinkExp()
  File "/mnt/nfs2/inf/tjb32/workspace/CNN_EL/nlp-entity-convnet/exp_multi_conv_cosim.py", line 51, in __init__
    self._setup()
  File "/mnt/nfs2/inf/tjb32/workspace/CNN_EL/nlp-entity-convnet/exp_multi_conv_cosim.py", line 543, in _setup
    on_unused_input='ignore',
  File "/home/t/tj/tjb32/.local/lib/python2.7/site-packages/theano/compile/function.py", line 326, in function
    output_keys=output_keys)
  File "/home/t/tj/tjb32/.local/lib/python2.7/site-packages/theano/compile/pfunc.py", line 484, in pfunc
    output_keys=output_keys)
  File "/home/t/tj/tjb32/.local/lib/python2.7/site-packages/theano/compile/function_module.py", line 1788, in orig_function
    output_keys=output_keys).create(
  File "/home/t/tj/tjb32/.local/lib/python2.7/site-packages/theano/compile/function_module.py", line 1467, in __init__
    optimizer_profile = optimizer(fgraph)
  File "/home/t/tj/tjb32/.local/lib/python2.7/site-packages/theano/gof/opt.py", line 102, in __call__
    return self.optimize(fgraph)
  File "/home/t/tj/tjb32/.local/lib/python2.7/site-packages/theano/gof/opt.py", line 90, in optimize
    ret = self.apply(fgraph, *args, **kwargs)
  File "/home/t/tj/tjb32/.local/lib/python2.7/site-packages/theano/gof/opt.py", line 235, in apply
    sub_prof = optimizer.optimize(fgraph)
  File "/home/t/tj/tjb32/.local/lib/python2.7/site-packages/theano/gof/opt.py", line 90, in optimize
    ret = self.apply(fgraph, *args, **kwargs)
  File "/home/t/tj/tjb32/.local/lib/python2.7/site-packages/theano/gof/opt.py", line 235, in apply
    sub_prof = optimizer.optimize(fgraph)
  File "/home/t/tj/tjb32/.local/lib/python2.7/site-packages/theano/gof/opt.py", line 90, in optimize
    ret = self.apply(fgraph, *args, **kwargs)
  File "/home/t/tj/tjb32/.local/lib/python2.7/site-packages/theano/gof/opt.py", line 2262, in apply
    lopt_change = self.process_node(fgraph, node, lopt)
  File "/home/t/tj/tjb32/.local/lib/python2.7/site-packages/theano/gof/opt.py", line 1825, in process_node
    lopt, node)
  File "/home/t/tj/tjb32/.local/lib/python2.7/site-packages/theano/gof/opt.py", line 1719, in warn_inplace
    return NavigatorOptimizer.warn(exc, nav, repl_pairs, local_opt, node)
  File "/home/t/tj/tjb32/.local/lib/python2.7/site-packages/theano/gof/opt.py", line 1705, in warn
    raise exc
AssertionError

我的 .theanorc 看起来像这样:

[global]
floatX = float32
device = gpu

[lib]
cnmem = 1

[nvcc]
fastmath = True

我的个人资料如下:

export LD_LIBRARY_PATH=/home/t/tj/tjb32/cuda/lib64:$LD_LIBRARY_PATH 
export CPATH=/home/t/tj/tjb32/cuda/include:$CPATH
export LIBRARY_PATH=/home/t/tj/tjb32/cuda/lib64:$LD_LIBRARY_PATH
export PATH=/home/t/tj/tjb32/cuda/bin:$PATH

当我查询 theano 时,返回以下内容,这表明 theano 正在与 CUDA 和 cuDNN 交互。

Using gpu device 0: Tesla K20m (CNMeM is enabled with initial size: 95.0% of memory, cuDNN 5005)

我非常确定我已经正确安装了 CUDA 和 cuDNN,如果有人可以建议我可能错过的导致 cuDNN 程序崩溃的任何其他配置步骤,我将不胜感激。

最佳答案

不确定这是否是问题所在但是: 导出 LIBRARY_PATH=/home/t/tj/tjb32/cuda/lib64:$LD_LIBRARY_PATH 应该? export LIBRARY_PATH=/home/t/tj/tjb32/cuda/lib64:$LIBRARY_PATH

关于python - Theano 在 Linux 中使用 cuDNN 崩溃,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39257060/

相关文章:

python - 在 NLP 任务的文本预处理中处理\u200b(零宽度空格)字符

linux - 删除目录中的所有文件,除了列出的符合特定条件的文件

python - 具有张量函数的 Theano 梯度

python - Google 的 TensorFlow 中的 Theano Dimshuffle 等效?

ubuntu - 尝试在 Ubuntu 上安装 Theano 时需要 Nose >= 0.10.0 错误

python - 用python解压大于内存的文件

python - 将 django sorl-thumbnail 与 django form-utils 一起使用

查找连续重复的单词时 Python 后视正则表达式 "fixed-width pattern"错误

linux - 无法在 Linux 上使用绝对路径创建 Docker 卷

linux - 如何修改标准linux命令?