assembly - 使用arm neon进行RGB到灰度转换

标签 assembly arm computer-vision neon

我正在尝试有效地从 RGB 转换为灰度,所以我从 here 得到了一个函数其中解释了如何从 RGBA 转换为灰度。现在我正在尝试做同样的事情,但只使用 RGB。我改变了一些东西,但似乎效果不佳。我不知道为什么,有人看到我的错误吗?

void neon_asm_convert(uint8_t * __restrict dest, uint8_t * __restrict src, int numPixels)
{
    __asm__ volatile(
     "lsr %2, %2, #3 \n"
     "# build the three constants:  \n"
     "mov r4, #28                   \n" // Blue channel multiplier
     "mov r5, #151                  \n" // Green channel multiplier
     "mov r6, #77                   \n" // Red channel multiplier
     "vdup.8 d4, r4                 \n"
     "vdup.8 d5, r5                 \n"
     "vdup.8 d6, r6                 \n"
     "0: \n"
     "# load 8 pixels: \n"  //RGBR
     "vld4.8 {d0-d3}, [%1]! \n"
     "# do the weight average: \n"
     "vmull.u8 q7, d0, d4 \n"
     "vmlal.u8 q7, d1, d5 \n"
     "vmlal.u8 q7, d2, d6 \n"
     "# shift and store: \n"
     "vshrn.u16 d7, q7, #8 \n" // Divide q3 by 256 and store in the d7
     "vst1.8 {d7}, [%0]! \n"
     "subs %2, %2, #1 \n" // Decrement iteration count

     "# load 8 pixels: \n"
     "vld4.8 {d8-d11}, [%1]! \n" //Other GBRG
     "# do the weight average: \n"
     "vmull.u8 q7, d3, d4 \n"
     "vmlal.u8 q7, d8, d5 \n"
     "vmlal.u8 q7, d9, d6 \n"
     "# shift and store: \n"
     "vshrn.u16 d7, q7, #8 \n" // Divide q3 by 256 and store in the d7
     "vst1.8 {d7}, [%0]! \n"
     "subs %2, %2, #1 \n" // Decrement iteration count

     "# load 8 pixels: \n"
     "vld4.8 {d0-d3}, [%1]! \n"
     "# do the weight average: \n"
     "vmull.u8 q7, d10, d4 \n"
     "vmlal.u8 q7, d11, d5 \n"
     "vmlal.u8 q7, d0, d6 \n"
     "# shift and store: \n"
     "vshrn.u16 d7, q7, #8 \n" // Divide q3 by 256 and store in the d7
     "vst1.8 {d7}, [%0]! \n"
     "subs %2, %2, #1 \n" // Decrement iteration count


     "# do the weight average: \n"
     "vmull.u8 q7, d1, d4 \n"
     "vmlal.u8 q7, d2, d5 \n"
     "vmlal.u8 q7, d3, d6 \n"
     "# shift and store: \n"
     "vshrn.u16 d7, q7, #8 \n" // Divide q3 by 256 and store in the d7
     "vst1.8 {d7}, [%0]! \n"

     "subs %2, %2, #1 \n" // Decrement iteration count



     "bne 0b \n" // Repeat unil iteration count is not zero
     :
     : "r"(dest), "r"(src), "r"(numPixels)
     : "r4", "r5", "r6"
    );
}

最佳答案

您应该使用“vld3.8 {d0-d2},[%1]!\n”

另请参阅http://hilbert-space.de/?p=22

关于assembly - 使用arm neon进行RGB到灰度转换,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8501987/

相关文章:

在linux中从C代码调用汇编函数

linux - 在 x86 ubuntu 上为 ARM 交叉编译 linux 内核模块

python - 使用 OpenCV python 在一个窗口中结合多个 Canny 边缘检测

assembly - 汇编语言将变量存储在特定地址

assembly - 64 位 x86 中 MOVZX r32、r/m16 和 MOVZX r64、r/m16 的区别

c - gcc __builtin 函数是否保证被匹配的汇编指令替换?

c# - 创建镜像并将 ASP.NET Core .NET 5 Docker 镜像部署到 ARM

python - 如何在python中计算多边形的IOU?

opencv - 如何匹配手势并进行比较?

assembly - 轰炸实验室阶段 5 - 6 个字符字符串、movzbl 加载和 $0xf、%ecx,并用它索引一个数组?