我正在尝试将二维数组传递给内核,以便每个线程都可以访问 index = threadIdx.x + (blockIdx.x * blockDim.x) 但我无法弄清楚如何执行此操作以及如何将数据复制回来。
size_t pitch;
cudaMallocPitch(&d_array, &pitch, block_size * sizeof(int), num_blocks);
cudaMemset2D(d_array, pitch, 0, block_size * sizeof(int), num_blocks * sizeof(int));
kernel<<<grid_size, block_size>>>(d_array, pitch);
cudaMemcpy2D(h_array, pitch, d_array, pitch, block_size, num_blocks, cudaMemcpyDeviceToHost);
for (num_blocks)
for(block_size)
h_array[block][thread] should be 1
__global__ void kernel(int *array, int pitch) {
int *row = (int*)((char*)array + blockIdx.x * pitch);
row[threadIdx.x] = 1;
return;
}
我在这里做错了什么?
最佳答案
您的 cudaMemset2D 正在访问您之前使用 cudaMallocPitch 分配的更大的内存空间您的 cudaMemcpy2D 正在复制该内存的一小部分。
您应该按以下方式使用该函数:
cudaMallocPitch(&d_array, &pitch, block_size * sizeof(int), num_blocks);
cudaMemset2D(d_array, pitch, 0, block_size * sizeof(int), num_blocks) // * sizeof(int)); <- This size is bigger than the previously declared
kernel<<<grid_size, block_size>>>(d_array, pitch);
cudaMemcpy2D(h_array, pitch, d_array, pitch, block_size * sizeof(int) /* you forgot this here */, num_blocks, cudaMemcpyDeviceToHost);
关于c++ - 管理二维 CUDA 阵列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18516427/