python - PyOpenCL 在内核代码中索引 3D 数组

我正在使用 PyOpenCL 在 Python 中处理图像并将 3D numpy 数组 (height x width x 4) 发送到内核.我无法在内核代码中索引 3D 数组。现在我只能复制整个输入数组到输出。当前代码如下所示，其中 img 是带有 img.shape = (320, 512, 4) 的图像:

__kernel void part1(__global float* img, __global float* results)
{
    unsigned int x = get_global_id(0);
    unsigned int y = get_global_id(1);
    unsigned int z = get_global_id(2);

    int index = x + 320*y + 320*512*z;

    results[index] = img[index];
}

但是，我不太明白这是如何工作的。例如，我如何在该内核中为 img[1, 2, 3] 的 Python 等价物编制索引？此外，如果我希望它位于 numpy 数组中的位置 results[1, 2, 3] 时，应该将哪个索引用于 results 来存储某些项目我将结果返回给 Python？

为了运行这个，我使用了这个 Python 代码:

import pyopencl as cl
import numpy as np

class OpenCL:
def __init__(self):
    self.ctx = cl.create_some_context()
    self.queue = cl.CommandQueue(self.ctx)

def loadProgram(self, filename):
    f = open(filename, 'r')
    fstr = "".join(f.readlines())
    self.program = cl.Program(self.ctx, fstr).build()

def opencl_energy(self, img):
    mf = cl.mem_flags

    self.img = img.astype(np.float32)

    self.img_buf = cl.Buffer(self.ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=self.img)
    self.dest_buf = cl.Buffer(self.ctx, mf.WRITE_ONLY, self.img.nbytes)

    self.program.part1(self.queue, self.img.shape, None, self.img_buf, self.dest_buf)
    c = np.empty_like(self.img)
    cl.enqueue_read_buffer(self.queue, self.dest_buf, c).wait()
    return c

example = OpenCL()
example.loadProgram("get_energy.cl")
image = np.random.rand(320, 512, 4)
image = image.astype(np.float32)
results = example.opencl_energy(image)
print("All items are equal:", (results==image).all())

最佳答案

更新: OpenCL 文档状态(在 3.5 中)，即

"Memory objects are categorized into two types: buffer objects, and image objects. A buffer
object stores a one-dimensional collection of elements whereas an image object is used to store a
two- or three- dimensional texture, frame-buffer or image."

因此，缓冲区始终是线性的，或者如您从我下面的示例中看到的那样线性化。

import pyopencl as cl
import numpy as np


h_a = np.arange(27).reshape((3,3,3)).astype(np.float32)

ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)

mf = cl.mem_flags
d_a  = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=h_a)

prg = cl.Program(ctx, """
__kernel void p(__global const float *d_a) {
  printf("Array element is %f ",d_a[10]);
}
""").build()

prg.p(queue, (1,), None, d_a)

给我

"Array element is 10"

作为输出。所以，缓冲区实际上是线性化数组。然而，从 numpy 知道的天真的 [x,y,z] 方法并不能那样工作。尽管如此，使用 2 或 3-D 图像而不是缓冲区应该可行。

关于python - PyOpenCL 在内核代码中索引 3D 数组，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/32188144/

python - PyOpenCL 在内核代码中索引 3D 数组

上一篇：python - Spark-submit 导入 SparkContext 失败

下一篇：python - 在 Python 中解析、聚合和排序文本文件