cuda - 是否可以启动一个在运行时定义了网格大小/ block 大小的 cuda 内核？

我想知道是否可以启动 cuda 内核，以便可以在运行时而不是像往常一样在编译时提及网格/ block 大小。

与此相关的任何帮助都将是非常宝贵的。

最佳答案

在 CUDA 应用程序中，为网格指定固定大小从来都不是很有用。大部分时间 block 大小是固定的，网格大小保持动态并根据输入数据大小而变化。考虑以下向量加法示例。

__global__ void kernel(float* a, float* b, float* c, int length)
{
    int tid = blockIdx.x * blockDim.x + threadIdx.x;

    //Bound checks inside the kernel
    if(tid<length)
       c[tid] = a[tid] + b[tid];
}

int addVectors(float* a, float* b, float* c, int length)
{
   //a, b, c are allocated on the device

   //Fix the block size to an appropriate value
   dim3 block(128);

   dim3 grid;
   grid.x = (length + block.x - 1)/block.x;

   //Grid size is dependent on the length of the vector. 
   //Total number of threads are rounded up to the nearest multiple of block size.
   //It means total number of threads are at least equal to the length of the vector.

   kernel<<<grid,block>>>(a,b,c,length);

   return 0;
}

关于cuda - 是否可以启动一个在运行时定义了网格大小/ block 大小的 cuda 内核？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/14973176/

上一篇：sql-server-2008 - 获取该月指定周的开始日期和结束日期

下一篇：.net - 如何在Web网格的列中编写IF条件

相关文章：

cuda - GPU 架构(英伟达)

c++ - 尽管通过了所有演示测试，SuiteSparse CHOLMOD 仍抛出 gpu_memorysize 错误

c++ - CUDA 使用 cudaMemcpy 复制多个结构数组

indexing - CUDA 索引未按预期工作

cuda - 在 Ubuntu 中验证支持 CUDA 的 GPU

deployment - 我什至需要 GPU 来部署深度学习模型吗？

silverlight - 在 Silverlight 5 上使用 GPU 进行通用数学计算

c++ - OpenCL 找不到平台？

c++ - 选择用于 CUDA 调试的设备

python - 使 pytorch 代码与在 CPU 或 GPU 上运行无关的更好方法？