python - 在 PyCUDA 中遍历二维数组

我试图在 PyCUDA 中遍历一个二维数组，但我最终得到了重复的数组值。我最初抛出一个小的随机整数数组，它按预期工作，但当我向它抛出图像时，我一遍又一遍地看到相同的值。

这是我的代码

img = np.random.randint(20, size = (4,5))
print "Input array"
print img
img_size=img.shape
print img_size

#nbtes determines the number of bytes for the numpy array a
img_gpu = cuda.mem_alloc(img.nbytes)
#Copies the memory from CPU to GPU
cuda.memcpy_htod(img_gpu, img)


mod = SourceModule("""
#include <stdio.h>
__global__ void AHE(int *a, int row, int col)
{
int i = threadIdx.x+ blockIdx.x* blockDim.x;
int j = threadIdx.y+ blockIdx.y* blockDim.y;
if(i==0 && j ==0)
printf("Output array ");
if(i <row && j < col)
{
    printf(" %d",a[j + i*col]);
}
}
""")

col = np.int32(img.shape[-1])
row = np.int32(img.shape[0])
func = mod.get_function("AHE")
func(img_gpu, row, col, block=(32,32,1))
img_ahe = np.empty_like(img)
cuda.memcpy_dtoh(img_ahe, img_gpu)

现在，当我用转换为 numpy 数组的图像替换随机整数数组时，我最终得到了这个

img = cv2.imread('Chest.jpg',0)
img_size=img.shape
print img_size

#nbtes determines the number of bytes for the numpy array a
img_gpu = cuda.mem_alloc(img.nbytes)
#Copies the memory from CPU to GPU
cuda.memcpy_htod(img_gpu, img)

mod = SourceModule("""
#include <stdio.h>
__global__ void AHE(int *a, int row, int col)
{
int i = threadIdx.x+ blockIdx.x* blockDim.x;
int j = threadIdx.y+ blockIdx.y* blockDim.y;
if(i==0 && j ==0)
printf("Output array ");
if(i <row && j < col)
{
    printf(" %d",a[j + i*col]);
}
}
""")
#Gives you the number of columns
col = np.int32(img.shape[-1])
row = np.int32(img.shape[0])
func = mod.get_function("AHE")
func(img_gpu, row, col, block=(32,32,1))
img_ahe = np.empty_like(img)
cuda.memcpy_dtoh(img_ahe, img_gpu)

最佳答案

这里的问题是您正在加载的图像没有将像素值存储为有符号整数。您的示例的这种修改更符合预期:

import pycuda.driver as cuda
from pycuda.compiler import SourceModule
import numpy as np
import cv2 

import pycuda.autoinit

img = cv2.imread('Chest.jpg',0)
img_size=img.shape
print img_size
print img.dtype

#nbtes determines the number of bytes for the numpy array a
img_gpu = cuda.mem_alloc(img.nbytes)
#Copies the memory from CPU to GPU
cuda.memcpy_htod(img_gpu, img)

mod = SourceModule("""
#include <stdio.h>
__global__ void AHE(unsigned char *a, int row, int col)
{
int i = threadIdx.x+ blockIdx.x* blockDim.x;
int j = threadIdx.y+ blockIdx.y* blockDim.y;
if(i==0 && j ==0)
printf("Output array ");
if(i <row && j < col)
{
    int val = int(a[j + i*col]);
    printf(" %d", val);
}
}
""")
#Gives you the number of columns
col = np.int32(img.shape[-1])
row = np.int32(img.shape[0])
func = mod.get_function("AHE")
func(img_gpu, row, col, block=(32,32,1))
img_ahe = np.empty_like(img)
cuda.memcpy_dtoh(img_ahe, img_gpu)

当运行代码时发出这个:

$ python image.py 
(681, 1024)
uint8
Output array  244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 244 245 245 245 246 246 246 246 246 246 246 246 246 246 246 244 244 244 244 244 244 244 244 245 245 245 245 245 245 245 245 244 244 245 245 245 246 246 246

[为简洁起见裁剪了输出]

注意图像的 dtype - uint8。您的代码试图将无符号 8 位值流视为整数。从技术上讲，它应该在完整图像上生成运行时错误，因为内核将读取超出图像大小的内容，因为它读取每个像素 4 个字节而不是 1 个字节。但是，您看不到这一点，因为您只运行一个 block ，并且您的输入图像大概至少比您运行的 block 的 32 x 32 大小大四倍。

顺便说一下，PyCUDA 非常擅长管理和强制执行 CUDA 调用的类型安全，但是您的代码巧妙地破坏了 PyCUDA 可以用来检测内核调用中的类型不匹配的所有机制。 PyCUDA 包含一个优秀的 GPUarray类(class)。你应该熟悉它。如果您在此处使用了 GPUarray 实例，您会遇到类型不匹配运行时错误，这会在您第一次尝试运行它时提醒您问题的确切来源。

关于python - 在 PyCUDA 中遍历二维数组，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/44057516/

python - 在 PyCUDA 中遍历二维数组

上一篇：c++ - 为什么 w/cout 不支持字符串 U/u 前缀？

下一篇：c++ - 使派生类使用重写运算符