这是这篇文章中所选答案的后续问题:Output of cuda program is not what was expected .
虽然以下功能有效:
__global__ void setVal(char **word)
{
char *myWord = word[(blockIdx.y * gridDim.x) + blockIdx.x];
myWord[0] = 'H';
myWord[1] = 'e';
myWord[2] = 'l';
myWord[3] = 'l';
myWord[4] = 'o';
}
为什么这行不通?
__global__ void setVal(char **word)
{
char *myWord = word[(blockIdx.y * gridDim.x) + blockIdx.x];
myWord = "Hello\0";
}
最佳答案
您应该开始更加注意编译器的输出。你的第二个内核代码:
__global__ void setVal(char **word)
{
char *myWord = word[(blockIdx.y * gridDim.x) + blockIdx.x];
myWord = "Hello\0";
}
编译成空内核,里面什么也没有:
$ nvcc -arch=sm_20 -c nullkernel.cu
nullkernel.cu(3): warning: variable "myWord" was set but never used
nullkernel.cu(3): warning: variable "myWord" was set but never used
原因是因为你认为的字符串复制赋值实际上只是一个指针赋值,在这种情况下编译器足够聪明,知道 myWord 没有写入内存,所以它只是消除了所有代码并警告您未使用 myWord。
如果我要问一个反问并以这种方式重写代码:
__global__ void setVal(char **word)
{
char *myWord = word[(blockIdx.y * gridDim.x) + blockIdx.x];
const char[] mymsg = "Hello\0";
myWord = mymsg;
}
为什么代码不能编译以及为什么它永远不能“隐式”执行字符串复制赋值,即使它编译了,会更明显吗?
关于Cuda char* 变量赋值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/6566910/