简单图像处理示例中的 C++AMP 异常

我正在尝试自学 C++AMP，并希望从我所在领域的一项非常简单的任务开始，即图像处理。我想将每像素 24 位的 RGB 图像(位图)转换为每像素 8 位的灰度图像。图像数据在 unsigned char 数组中可用(从 Bitmap::LockBits(...) 等获取)

我知道 C++AMP 出于某种原因无法通过 array 或 array_view< 处理 char 或 unsigned char 数据，所以我尝试根据 that blog 使用 texture . Here它解释了如何写入 8bpp 纹理，尽管 VisualStudio 2013 告诉我 writeonly_texture_view 已被弃用。

我的代码抛出运行时异常，显示“无法分派(dispatch)内核”。异常的完整文本很长:

ID3D11DeviceContext::Dispatch: The Unordered Access View (UAV) in slot 0 of the Compute Shader unit has the Format (R8_UINT). This format does not support being read from a shader as as UAV. This mismatch is invalid if the shader actually uses the view (e.g. it is not skipped due to shader code branching). It was unfortunately not possible to have all hardware implementations support reading this format as a UAV, despite that the format can written to as a UAV. If the shader only needs to perform reads but not writes to this resource, consider using a Shader Resource View instead of a UAV.

到目前为止我使用的代码是这样的:

namespace gpu = concurrency;

gpu::extent<3> inputExtent(height, width, 3);
gpu::graphics::texture<unsigned int, 3> inputTexture(inputExtent, eight);
gpu::graphics::copy((void*)inputData24bpp, dataLength, inputTexture);
gpu::graphics::texture_view<unsigned int, 3> inputTexView(inputTexture);
gpu::graphics::texture<unsigned int, 2> outputTexture(width, height, eight);
gpu::graphics::writeonly_texture_view<unsigned int, 2> outputTexView(outputTexture);

gpu::parallel_for_each(outputTexture.extent,
    [inputTexView, outputTexView](gpu::index<2> pix) restrict(amp) {
    gpu::index<3> indR(pix[0], pix[1], 0);
    gpu::index<3> indG(pix[0], pix[1], 1);
    gpu::index<3> indB(pix[0], pix[1], 2);
    unsigned int sum = inputTexView[indR] + inputTexView[indG] + inputTexView[indB];
    outputTexView.set(pix, sum / 3);
});

gpu::graphics::copy(outputTexture, outputData8bpp);

出现此异常的原因是什么，我可以采取什么解决方法？

最佳答案

我也一直在自学 C++Amp，遇到了与您的问题非常相似的问题，但就我而言，我需要处理 16 位图像。

这个问题可能可以使用纹理来解决，但由于缺乏经验我无法帮助您。

所以，我所做的基本上是基于位掩码。

首先，欺骗编译器让你编译:

unsigned int* sourceData = reinterpret_cast<unsigned int*>(source);
unsigned int* destData   = reinterpret_cast<unsigned int*>(dest);

接下来，您的数组查看器必须查看您的所有数据。请注意 viwer 实际上您的数据是 32 位大小的。因此，您必须进行转换(因为 16 位而分为 2，对于 8 位则使用 4)。

concurrency::array_view<const unsigned int> source( (size+ 7)/2, sourceData) );
concurrency::array_view<unsigned int> dest( (size+ 7)/2, sourceData) );

现在，您可以编写典型的 for_each block 。

typedef concurrency::array_view<const unsigned int> OriginalImage;
typedef concurrency::array_view<unsigned int> ResultImage;

bool Filters::Filter_Invert()
{
    const int size = k_width*k_height;
    const int maxVal = GetMaxSize();

    OriginalImage& im_original = GetOriginal();
    ResultImage& im_result = GetResult();
    im_result.discard_data();

    parallel_for_each(
        concurrency::extent<2>(k_width, k_height), 
        [=](concurrency::index<2> idx) restrict(amp)
    {
        const int pos = GetPos(idx);
        const int val = read_int16(im_original, pos);

        write_int16(im_result, pos, maxVal - val);
    });

    return true;
}

int Filters::GetPos( const concurrency::index<2>& idx )  restrict(amp, cpu)
{
    return idx[0] * Filters::k_height + idx[1];
}

魔法来了:

template <typename T>
unsigned int read_int16(T& arr, int idx) restrict(amp, cpu)
{
    return (arr[idx >> 1] & (0xFFFF << ((idx & 0x7) << 4))) >> ((idx & 0x7) << 4);
}

template<typename T>
void write_int16(T& arr, int idx, unsigned int val) restrict(amp, cpu)
{
    atomic_fetch_xor(&arr[idx >> 1], arr[idx >> 1] & (0xFFFF << ((idx & 0x7) << 4)));
    atomic_fetch_xor(&arr[idx >> 1], (val & 0xFFFF) << ((idx & 0x7) << 4));
}

请注意，此方法适用于 16 位，不适用于 8 位，但适应 8 位应该不会太困难。事实上，这是基于 8 位版本的，不幸的是，我找不到引用。

希望对您有所帮助。

大卫

关于简单图像处理示例中的 C++AMP 异常，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/22518262/

简单图像处理示例中的 C++AMP 异常

上一篇：c++ - 链表只记住最近添加的对象

下一篇：c++ - 将两个空指针 vector (void *) 合并到新的结果 vector 中