python - 为什么 Theano 打印 "cc1plus: fatal error: cuda_runtime.h: No such file or directory"?

标签 python cuda gpu nvcc theano

我正在尝试将 GPU 与 Theano 结合使用。我读过 this tutorial .

但是,我无法让 theano 使用 GPU,我不知道如何继续。

试验机

$ cat /etc/issue
Welcome to openSUSE 12.1 "Asparagus" - Kernel \r (\l).
$ nvidia-smi -L
GPU 0: Tesla C2075 (S/N: 0324111084577)
$ echo $LD_LIBRARY_PATH
/usr/local/cuda-5.0/lib64:[other]:/usr/local/lib:/usr/lib:/usr/local/X11/lib:[other]
$ find /usr/local/ -name cuda_runtime.h
/usr/local/cuda-5.0/include/cuda_runtime.h
$ echo $C_INCLUDE_PATH
/usr/local/cuda-5.0/include/
$ echo $CXX_INCLUDE_PATH
/usr/local/cuda-5.0/include/
$ nvidia-smi -a
NVIDIA: could not open the device file /dev/nvidiactl (Permission denied).
Failed to initialize NVML: Insufficient Permissions
$ echo $PATH
/usr/lib64/mpi/gcc/openmpi/bin:/home/mthoma/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:.:/home/mthoma/bin
$ ls -l /dev/nv*
crw-rw---- 1 root video 195,   0  1. Jul 09:47 /dev/nvidia0
crw-rw---- 1 root video 195, 255  1. Jul 09:47 /dev/nvidiactl
crw-r----- 1 root kmem   10, 144  1. Jul 09:46 /dev/nvram
# nvidia-smi -a

==============NVSMI LOG==============

Timestamp                       : Wed Jul 30 05:13:52 2014
Driver Version                  : 304.33

Attached GPUs                   : 1
GPU 0000:04:00.0
    Product Name                : Tesla C2075
    Display Mode                : Enabled
    Persistence Mode            : Disabled
    Driver Model
        Current                 : N/A
        Pending                 : N/A
    Serial Number               : 0324111084577
    GPU UUID                    : GPU-7ea505ef-ad46-bb24-c440-69da9b300040
    VBIOS Version               : 70.10.46.00.05
    Inforom Version
        Image Version           : N/A
        OEM Object              : 1.1
        ECC Object              : 2.0
        Power Management Object : 4.0
    PCI
        Bus                     : 0x04
        Device                  : 0x00
        Domain                  : 0x0000
        Device Id               : 0x109610DE
        Bus Id                  : 0000:04:00.0
        Sub System Id           : 0x091010DE
        GPU Link Info
            PCIe Generation
                Max             : 2
                Current         : 1
            Link Width
                Max             : 16x
                Current         : 16x
    Fan Speed                   : 30 %
    Performance State           : P12
    Clocks Throttle Reasons     : N/A
    Memory Usage
        Total                   : 5375 MB
        Used                    : 39 MB
        Free                    : 5336 MB
    Compute Mode                : Default
    Utilization
        Gpu                     : 0 %
        Memory                  : 5 %
    Ecc Mode
        Current                 : Enabled
        Pending                 : Enabled
    ECC Errors
        Volatile
            Single Bit            
                Device Memory   : 0
                Register File   : 0
                L1 Cache        : 0
                L2 Cache        : 0
                Texture Memory  : N/A
                Total           : 0
            Double Bit            
                Device Memory   : 0
                Register File   : 0
                L1 Cache        : 0
                L2 Cache        : 0
                Texture Memory  : N/A
                Total           : 0
        Aggregate
            Single Bit            
                Device Memory   : 133276
                Register File   : 0
                L1 Cache        : 0
                L2 Cache        : 0
                Texture Memory  : N/A
                Total           : 133276
            Double Bit            
                Device Memory   : 203730
                Register File   : 0
                L1 Cache        : 0
                L2 Cache        : 0
                Texture Memory  : N/A
                Total           : 203730
    Temperature
        Gpu                     : 58 C
    Power Readings
        Power Management        : Supported
        Power Draw              : 33.83 W
        Power Limit             : 225.00 W
        Default Power Limit     : N/A
        Min Power Limit         : N/A
        Max Power Limit         : N/A
    Clocks
        Graphics                : 50 MHz
        SM                      : 101 MHz
        Memory                  : 135 MHz
    Applications Clocks
        Graphics                : N/A
        Memory                  : N/A
    Max Clocks
        Graphics                : 573 MHz
        SM                      : 1147 MHz
        Memory                  : 1566 MHz
    Compute Processes           : None

Cuda 示例

以 super 用户身份编译和执行(使用cuda/C/0_Simple/simpleMultiGPU测试):

# ldconfig /usr/local/cuda-5.0/lib64/
# ./simpleMultiGPU 
[simpleMultiGPU] starting...

CUDA-capable device count: 1
Generating input data...

Computing with 1 GPUs...
  GPU Processing time: 27.814000 (ms)

Computing with Host CPU...

Comparing GPU and Host CPU results...
  GPU sum: 16777296.000000
  CPU sum: 16777294.395033
  Relative difference: 9.566307E-08 

[simpleMultiGPU] test results...
PASSED

> exiting in 3 seconds: 3...2...1...done!

当我以普通用户身份尝试此操作时,我得到:

$ ./simpleMultiGPU 
[simpleMultiGPU] starting...

CUDA error at simpleMultiGPU.cu:87 code=38(cudaErrorNoDevice) "cudaGetDeviceCount(&GPU_N)" 
CUDA-capable device count: 0
Generating input data...

Floating point exception

如何让 cuda 与非 super 用户一起工作?

测试代码

以下代码来自“Testing Theano with GPU

#!/usr/bin/env python
from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print f.maker.fgraph.toposort()
t0 = time.time()
for i in xrange(iters):
    r = f()
t1 = time.time()
print 'Looping %d times took' % iters, t1 - t0, 'seconds'
print 'Result is', r
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
    print 'Used the cpu'
else:
    print 'Used the gpu'

错误信息

完整的错误消息太长,无法在此处发布。更长的版本在 http://pastebin.com/eT9vbk7M 上,但我认为相关部分是:

cc1plus: fatal error: cuda_runtime.h: No such file or directory
compilation terminated.
ERROR (theano.sandbox.cuda): Failed to compile cuda_ndarray.cu: ('nvcc return status', 1, 'for cmd', 'nvcc -shared -g -O3 -m64 -Xcompiler -DCUDA_NDARRAY_CUH=bcb411d72e41f81f3deabfc6926d9728,-D NPY_ARRAY_ENSURECOPY=NPY_ENSURECOPY,-D NPY_ARRAY_ALIGNED=NPY_ALIGNED,-D NPY_ARRAY_WRITEABLE=NPY_WRITEABLE,-D NPY_ARRAY_UPDATE_ALL=NPY_UPDATE_ALL,-D NPY_ARRAY_C_CONTIGUOUS=NPY_C_CONTIGUOUS,-D NPY_ARRAY_F_CONTIGUOUS=NPY_F_CONTIGUOUS,-fPIC -Xlinker -rpath,/home/mthoma/.theano/compiledir_Linux-3.1.10-1.16-desktop-x86_64-with-SuSE-12.1-x86_64-x86_64-2.7.2/cuda_ndarray -Xlinker -rpath,/usr/local/cuda-5.0/lib -Xlinker -rpath,/usr/local/cuda-5.0/lib64 -I/usr/local/lib/python2.7/site-packages/Theano-0.6.0rc1-py2.7.egg/theano/sandbox/cuda -I/usr/local/lib/python2.7/site-packages/numpy-1.6.2-py2.7-linux-x86_64.egg/numpy/core/include -I/usr/include/python2.7 -o /home/mthoma/.theano/compiledir_Linux-3.1.10-1.16-desktop-x86_64-with-SuSE-12.1-x86_64-x86_64-2.7.2/cuda_ndarray/cuda_ndarray.so mod.cu -L/usr/local/cuda-5.0/lib -L/usr/local/cuda-5.0/lib64 -L/usr/lib64 -lpython2.7 -lcublas -lcudart')
WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu is not available

标准流给出:

['nvcc', '-shared', '-g', '-O3', '-m64', '-Xcompiler', '-DCUDA_NDARRAY_CUH=bcb411d72e41f81f3deabfc6926d9728,-D NPY_ARRAY_ENSURECOPY=NPY_ENSURECOPY,-D NPY_ARRAY_ALIGNED=NPY_ALIGNED,-D NPY_ARRAY_WRITEABLE=NPY_WRITEABLE,-D NPY_ARRAY_UPDATE_ALL=NPY_UPDATE_ALL,-D NPY_ARRAY_C_CONTIGUOUS=NPY_C_CONTIGUOUS,-D NPY_ARRAY_F_CONTIGUOUS=NPY_F_CONTIGUOUS,-fPIC', '-Xlinker', '-rpath,/home/mthoma/.theano/compiledir_Linux-3.1.10-1.16-desktop-x86_64-with-SuSE-12.1-x86_64-x86_64-2.7.2/cuda_ndarray', '-Xlinker', '-rpath,/usr/local/cuda-5.0/lib', '-Xlinker', '-rpath,/usr/local/cuda-5.0/lib64', '-I/usr/local/lib/python2.7/site-packages/Theano-0.6.0rc1-py2.7.egg/theano/sandbox/cuda', '-I/usr/local/lib/python2.7/site-packages/numpy-1.6.2-py2.7-linux-x86_64.egg/numpy/core/include', '-I/usr/include/python2.7', '-o', '/home/mthoma/.theano/compiledir_Linux-3.1.10-1.16-desktop-x86_64-with-SuSE-12.1-x86_64-x86_64-2.7.2/cuda_ndarray/cuda_ndarray.so', 'mod.cu', '-L/usr/local/cuda-5.0/lib', '-L/usr/local/cuda-5.0/lib64', '-L/usr/lib64', '-lpython2.7', '-lcublas', '-lcudart']
[Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)]
Looping 1000 times took 3.25972604752 seconds
Result is [ 1.23178029  1.61879337  1.52278066 ...,  2.20771813  2.29967761
  1.62323284]
Used the cpu

theano.rc

$ cat .theanorc 
[global]
device = gpu
floatX = float32
[cuda]
root = /usr/local/cuda-5.0

最佳答案

正如一些评论所说,问题是/dev/nvidia* 的权限。正如一些人所说,这意味着在您的启动过程中,它没有得到正确的初始化。通常,这会在 GUI 启动时正确完成。我的猜测是您没有启用或安装它。所以你可能有一个 headless 服务器。

要解决此问题,只需以 root 身份运行 nvidia-smi。这将检测到它没有正确启动并将修复它。 root 有修复问题的权限。普通用户无权修复此问题。这就是为什么它可以与 root 一起使用(它会自动修复),但不能作为普通用户使用。

此修复需要在每次计算机启动时完成。要自动执行此操作,您可以以根用户身份创建此文件 /etc/init.d/nvidia-gpu-config,内容如下:

#!/bin/sh
#
# nvidia-gpu-config    Start the correct initialization of nvidia GPU driver.
#
# chkconfig: - 90 90
# description:  Init gpu to wanted states

# sudo /sbin/chkconfig --add nvidia-smi
#

case $1 in
'start')
nvidia-smi
;;
esac

然后以 root 身份运行此命令:/sbin/chkconfig --add nvidia-gpu-config

更新:这适用于使用初始化系统 SysV 的操作系统。如果你的系统使用初始化系统systemd,我不知道它是否有效。

关于python - 为什么 Theano 打印 "cc1plus: fatal error: cuda_runtime.h: No such file or directory"?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25003733/

相关文章:

python - 连接 np.arrays python

python - PyQt - QTableView 中的复选框列

c - 与 Geforce 相比,Tesla 的内核开销要少多少?

cuda - CUDA 常量内存的生命周期是多少?

php - 使用 FPGA 和 GPU 加速 PHP

neural-network - Tensorflow:GPU 利用率几乎始终为 0%

python - 多个 np.where 减少匹配

javascript - Django Rest 框架正在从数据库中返回 'u 前缀到 Angular

cuda - 在 OpenACC 中使用共享内存

java - GPU 上的内存大小分配 - opengl 纹理加载问题