cuda - nvprof 输出 : "No kernels were profiled" mean, 是什么以及如何修复它

我最近通过系统的包管理器在我的 arch-Linux 机器上安装了 Cuda，我一直试图通过运行一个简单的向量加法程序来测试它是否工作。
我只是复制粘贴 this tutorial 中的代码(使用一个或多个内核的那个)放入一个名为 cuda_test.cu 的文件中并运行

> nvcc cuda_test.cu -o cuda_test

在任何一种情况下，程序都可以运行，并且我没有收到任何错误(因为程序没有崩溃并且输出是没有错误)。但是当我尝试在程序上运行 Cuda 分析器时:

> sudo nvprof ./cuda_test

我得到结果:

==3201== NVPROF is profiling process 3201, command: ./cuda_test
Max error: 0
==3201== Profiling application: ./cuda_test
==3201== Profiling result:
No kernels were profiled.
No API activities were profiled.
==3201== Warning: Some profiling data are not recorded. Make sure cudaProfilerStop() or cuProfilerStop() is called before application exit to flush profile data.

后一个警告不是我的主要问题或我的问题的主题，我的问题是消息说没有分析内核并且没有分析 API 事件。
这是否意味着该程序完全在我的 CPU 上运行？还是 nvprof 中的错误？
我发现了一个关于相同错误的讨论 here ，但答案是安装了错误版本的 Cuda，在我的情况下，安装的版本是通过系统包管理器安装的最新版本 ( Version 10.1.243-1 )
有什么办法可以让 nvprof 显示预期的输出？
编辑
试图坚持最后的警告并不能解决问题:
添加调用 cudaProfilerStop() (或 cuProfilerStop())，并添加 cudaDeviceReset();最后按照建议链接适当的库( cuda_profiler_api.h 或 cudaProfiler.h )并编译

> nvcc cuda_test.cu -o cuda_test -lcuda

生成一个仍然可以运行的程序，但是当运行哪个 nvprof 时，返回:

==12558== NVPROF is profiling process 12558, command: ./cuda_test
Max error: 0
==12558== Profiling application: ./cuda_test
==12558== Profiling result:
No kernels were profiled.
No API activities were profiled.
==12558== Warning: Some profiling data are not recorded. Make sure cudaProfilerStop() or cuProfilerStop() is called before application exit to flush profile data.
======== Error: Application received signal 139

这并没有解决原来的问题，反而造成了新的错误；当 cudaProfilerStop() 时也会发生同样的情况单独使用或与 cuProfilerStop() 一起使用和 cudaDeviceReset();编码
如上所述，代码是从教程中复制来测试 Cuda 是否正常工作的，尽管我也包含了对 cudaProfilerStop() 的调用。和 cudaDeviceReset() ;为清楚起见，此处包括:

#include <iostream>

#include <math.h>

#include <cuda_profiler_api.h>

// Kernel function to add the elements of two arrays

__global__
void add(int n, float *x, float *y)
{
  int index = threadIdx.x;
  int stride = blockDim.x;
  for (int i = index; i < n; i += stride)
      y[i] = x[i] + y[i];
}


int main(void)

{

  int N = 1<<20;

  float *x, *y;


  cudaProfilerStart();


  // Allocate Unified Memory – accessible from CPU or GPU

  cudaMallocManaged(&x, N*sizeof(float));

  cudaMallocManaged(&y, N*sizeof(float));



  // initialize x and y arrays on the host

  for (int i = 0; i < N; i++) {

    x[i] = 1.0f;

    y[i] = 2.0f;

  }



  // Run kernel on 1M elements on the GPU

    add<<<1, 1>>>(N, x, y);



  // Wait for GPU to finish before accessing on host

  cudaDeviceSynchronize();



  // Check for errors (all values should be 3.0f)

  float maxError = 0.0f;

  for (int i = 0; i < N; i++)

    maxError = fmax(maxError, fabs(y[i]-3.0f));

  std::cout << "Max error: " << maxError << std::endl;



  // Free memory

  cudaFree(x);

  cudaFree(y);
  
  cudaDeviceReset();
  cudaProfilerStop();

  

  return 0;

}

最佳答案

这个问题显然有些众所周知，经过一番搜索我发现 this thread关于编辑版本中的错误代码；那里讨论的解决方案是使用标志 --unified-memory-profiling off 调用 nvprof:

> sudo nvprof --unified-memory-profiling off ./cuda_test

这使得 nvprof 按预期工作——即使没有调用 cudaProfileStop。

关于cuda - nvprof 输出 : "No kernels were profiled" mean, 是什么以及如何修复它，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57611689/

cuda - nvprof 输出 : "No kernels were profiled" mean, 是什么以及如何修复它

上一篇：pip - 如何修复 ImportError : No module named cryptography?

下一篇：unit-testing - 单元测试使用Jest和React测试库对外部Click外部组件进行测试