opencv - OpenCL无法使用OpenCV检测我的AMD GPU

标签 opencv image-processing opencl x86-64 opencv3.0

我正在使用AMD Radeon R9 M375。我尝试按照此答案https://stackoverflow.com/a/34250412/8731839进行操作,但对我不起作用。

我遵循此:http://answers.opencv.org/question/108646/opencl-can-not-detect-my-nvidia-gpu-via-opencv/?answer=108784#post-id-108784

这是我的clinfo.exe输出

  Platform Name:                 AMD Accelerated Parallel Processing
Number of devices:               2
  Device Type:                   CL_DEVICE_TYPE_GPU
  Vendor ID:                     1002h
  Board name:                    AMD Radeon (TM) R9 M375
  Device Topology:               PCI[ B#4, D#0, F#0 ]
  Max compute units:                 10
  Max work items dimensions:             3
    Max work items[0]:               256
    Max work items[1]:               256
    Max work items[2]:               256
  Max work group size:               256
  Preferred vector width char:           4
  Preferred vector width short:          2
  Preferred vector width int:            1
  Preferred vector width long:           1
  Preferred vector width float:          1
  Preferred vector width double:         1
  Native vector width char:          4
  Native vector width short:             2
  Native vector width int:           1
  Native vector width long:          1
  Native vector width float:             1
  Native vector width double:            1
  Max clock frequency:               1015Mhz
  Address bits:                  32
  Max memory allocation:             3019898880
  Image support:                 Yes
  Max number of images read arguments:       128
  Max number of images write arguments:      8
  Max image 2D width:                16384
  Max image 2D height:               16384
  Max image 3D width:                2048
  Max image 3D height:               2048
  Max image 3D depth:                2048
  Max samplers within kernel:            16
  Max size of kernel argument:           1024
  Alignment (bits) of base address:      2048
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                     No
    Quiet NaNs:                  Yes
    Round to nearest even:           Yes
    Round to zero:               Yes
    Round to +ve and infinity:           Yes
    IEEE754-2008 fused multiply-add:         Yes
  Cache type:                    Read/Write
  Cache line size:               64
  Cache size:                    16384
  Global memory size:                3221225472
  Constant buffer size:              65536
  Max number of constant args:           8
  Local memory type:                 Scratchpad
  Local memory size:                 32768
  Max pipe arguments:                0
  Max pipe active reservations:          0
  Max pipe packet size:              0
  Max global variable size:          0
  Max global variable preferred total size:  0
  Max read/write image args:             0
  Max on device events:              0
  Queue on device max size:          0
  Max on device queues:              0
  Queue on device preferred size:        0
  SVM capabilities:              
    Coarse grain buffer:             No
    Fine grain buffer:               No
    Fine grain system:               No
    Atomics:                     No
  Preferred platform atomic alignment:       0
  Preferred global atomic alignment:         0
  Preferred local atomic alignment:      0
  Kernel Preferred work group size multiple:     64
  Error correction support:          0
  Unified memory for Host and Device:        0
  Profiling timer resolution:            1
  Device endianess:              Little
  Available:                     Yes
  Compiler available:                Yes
  Execution capabilities:                
    Execute OpenCL kernels:          Yes
    Execute native function:             No
  Queue on Host properties:              
    Out-of-Order:                No
    Profiling :                  Yes
  Queue on Device properties:                
    Out-of-Order:                No
    Profiling :                  No
  Platform ID:                   00007FFF209D0188
  Name:                      Capeverde
  Vendor:                    Advanced Micro Devices, Inc.
  Device OpenCL C version:           OpenCL C 1.2 
  Driver version:                2348.3
  Profile:                   FULL_PROFILE
  Version:                   OpenCL 1.2 AMD-APP (2348.3)
  Extensions:                    cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics 

cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing 

cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing 

cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event cl_amd_liquid_flash 


      Device Type:                   CL_DEVICE_TYPE_CPU
      Vendor ID:                     1002h
      Board name:                    
      Max compute units:                 4
      Max work items dimensions:             3
        Max work items[0]:               1024
        Max work items[1]:               1024
        Max work items[2]:               1024
      Max work group size:               1024
      Preferred vector width char:           16
      Preferred vector width short:          8
      Preferred vector width int:            4
      Preferred vector width long:           2
      Preferred vector width float:          8
      Preferred vector width double:         4
      Native vector width char:          16
      Native vector width short:             8
      Native vector width int:           4
      Native vector width long:          2
      Native vector width float:             8
      Native vector width double:            4
      Max clock frequency:               2200Mhz
      Address bits:                  64
      Max memory allocation:             2147483648
      Image support:                 Yes
      Max number of images read arguments:       128
      Max number of images write arguments:      64
      Max image 2D width:                8192
      Max image 2D height:               8192
      Max image 3D width:                2048
      Max image 3D height:               2048
      Max image 3D depth:                2048
      Max samplers within kernel:            16
      Max size of kernel argument:           4096
      Alignment (bits) of base address:      1024
      Minimum alignment (bytes) for any datatype:    128
      Single precision floating point capability
        Denorms:                     Yes
        Quiet NaNs:                  Yes
        Round to nearest even:           Yes
        Round to zero:               Yes
        Round to +ve and infinity:           Yes
        IEEE754-2008 fused multiply-add:         Yes
      Cache type:                    Read/Write
      Cache line size:               64
      Cache size:                    32768
      Global memory size:                8499593216
      Constant buffer size:              65536
      Max number of constant args:           8
      Local memory type:                 Global
      Local memory size:                 32768
      Max pipe arguments:                16
      Max pipe active reservations:          16
      Max pipe packet size:              2147483648
      Max global variable size:          1879048192
      Max global variable preferred total size:  1879048192
      Max read/write image args:             64
      Max on device events:              0
      Queue on device max size:          0
      Max on device queues:              0
      Queue on device preferred size:        0
      SVM capabilities:              
        Coarse grain buffer:             No
        Fine grain buffer:               No
        Fine grain system:               No
        Atomics:                     No
      Preferred platform atomic alignment:       0
      Preferred global atomic alignment:         0
      Preferred local atomic alignment:      0
      Kernel Preferred work group size multiple:     1
      Error correction support:          0
      Unified memory for Host and Device:        1
      Profiling timer resolution:            465
      Device endianess:              Little
      Available:                     Yes
      Compiler available:                Yes
      Execution capabilities:                
        Execute OpenCL kernels:          Yes
        Execute native function:             Yes
      Queue on Host properties:              
        Out-of-Order:                No
        Profiling :                  Yes
      Queue on Device properties:                
        Out-of-Order:                No
        Profiling :                  No
      Platform ID:                   00007FFF209D0188
      Name:                      Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz
      Vendor:                    GenuineIntel
      Device OpenCL C version:           OpenCL C 1.2 
      Driver version:                2348.3 (sse2,avx)
      Profile:                   FULL_PROFILE
      Version:                   OpenCL 1.2 AMD-APP (2348.3)

什么有效:
std::vector<cv::ocl::PlatformInfo> platforms;
cv::ocl::getPlatfomsInfo(platforms);

//OpenCL Platforms
for (size_t i = 0; i < platforms.size(); i++)
{

    //Access to Platform
    const cv::ocl::PlatformInfo* platform = &platforms[i];

    //Platform Name
    std::cout << "Platform Name: " << platform->name().c_str() << "\n";
    //Access Device within Platform
    cv::ocl::Device current_device;
    for (int j = 0; j < platform->deviceNumber(); j++)
    {
        //Access Device
        platform->getDevice(current_device, j);
        //Device Type
        int deviceType = current_device.type();
        cout << "Device Number: " << platform->deviceNumber() << endl;
        cout << "Device Type: " << deviceType << endl;
    }
}

上面的代码显示
 Platform Name: Intel(R) OpenCL
 Device Number: 2
 Device Type: 2
 Device Number: 2
 Device Type: 4 
 Platform Name: AMD Accelerated Parallel Processing
 Device Number: 2
 Device Type: 4 
 Device Number: 2
 Device Type: 2 

如何使用AMD作为我的GPU从此处创建上下文?链接的文章说使用initializeContextFromHandler方法,但是有关OpenCV的文档还不够。 Documentation Link

最佳答案

问题已解决。我不知道自己做了什么,但是AMD现在正在工作。

当前设置(在Windows上):

  • 环境变量:
    Name: OPENCV_OPENCL_DEVICE
    
    Value: AMD:GPU:Capeverde
    
  • 使用setUseOpenCL(bool foo)中存在的ocl.hpp选择使用GPU还是CPU。

  • 最可能出现的问题:在我的实际代码中,我没有进行任何计算,但是当我编写了一个用于将两个矩阵相减的简单代码时,AMD开始工作。

    码:
    #include <opencv2/core/ocl.hpp>
    #include <opencv2/opencv.hpp>
    
    int main() {
        cv::UMat mat1 = cv::UMat::ones(10, 10, CV_32F);
        cv::UMat mat2 = cv::UMat::zeros(10, 10, CV_32F);
        cv::UMat output = cv::UMat(10, 10, CV_32F);
        cv::subtract(mat1, mat2, output);
        std::cout << output << "\n";
        std::getchar();
    }
    

    关于opencv - OpenCL无法使用OpenCV检测我的AMD GPU,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48998571/

    相关文章:

    java - 我在哪里可以安装用于 MatchTemplare 导入的文件

    android - OpenCV 相机预览颜色关闭?

    java - 如何在Java中制作圆角图像

    c++ - "CPU OpenCL Project"和 "GPU OpenCL Project"的区别

    c++ - libOpenCL.so 使用 VFP 寄存器参数,输出不

    python - cv2.VideoCapture 帧速率的差异取决于初始化参数

    c++ - 在 Windows 中为 Linux on Intel 编译 C++ 库

    c++ - OpenCV 估计 Affine3D 共面点的断裂

    c++ - OpenCV Cuda 不是有类似于 findContours 的函数吗?

    c++ - OpenCL 重用具有不同 DEFINE (-D) 的 cl_kernel