c++ - 从 SIMD vector 中提取集合字节位置

我使用 SIMD 指令运行一系列计算。这些指令返回一个 16 字节的 vector 作为结果，名为 compare , 每个字节都是 0x00或 0xff :

             0    1    2    3    4    5    6    7       15   16
compare : 0x00 0x00 0x00 0x00 0xff 0x00 0x00 0x00 ... 0xff 0x00

字节设置为 0xff意思是我需要运行函数 do_operation(i) i 是字节的位置。

比如上面的compare vector 意味着，我需要运行这个操作序列:

do_operation(4);
do_operation(15);

这是迄今为止我想到的最快的解决方案:

for(...) {
        //
        // SIMD computations
        //
        __m128i compare = ... // Result of SIMD computations

        // Extract high and low quadwords for compare vector
        std::uint64_t cmp_low = (_mm_cvtsi128_si64(compare));
        std::uint64_t cmp_high = (_mm_extract_epi64(compare, 1));

        //  Process low quadword 
        if (cmp_low) {
            const std::uint64_t low_possible_positions = 0x0706050403020100;
            const std::uint64_t match_positions = _pext_u64(
                    low_possible_positions, cmp_low);
            const int match_count = _popcnt64(cmp_low) / 8;
            const std::uint8_t* match_pos_array =
                    reinterpret_cast<const std::uint8_t*>(&match_positions);

            for (int i = 0; i < match_count; ++i) {
                do_operation(i);
            }
        }

        // Process high quadword (similarly)
        if (cmp_high) { 

            const std::uint64_t high_possible_positions = 0x0f0e0d0c0b0a0908;
            const std::uint64_t match_positions = _pext_u64(
                    high_possible_positions, cmp_high);
            const int match_count = _popcnt64(cmp_high) / 8;
            const std::uint8_t* match_pos_array =
                    reinterpret_cast<const std::uint8_t*>(&match_positions);

            for(int i = 0; i < match_count; ++i) {
                do_operation(i);
            }
        }
}

我首先提取 128 位 vector (cmp_low 和 cmp_high)的第一个和第二个 64 位整数。然后我使用 popcount计算设置为 0xff 的字节数(设置为 1 的位数除以 8)。最后，我使用 pext得到没有零的位置，像这样:

0x0706050403020100
0x000000ff00ff0000
        |
      PEXT
        |
0x0000000000000402

我想找到一个更快的解决方案来提取设置为0xff的字节的位置在compare vector 。更准确地说，通常只有 0、1 或 2 个字节设置为 0xff在compare vector ，我想使用此信息来避免一些分支。

最佳答案

以下是如何减少测试数量的简要概述:

首先使用函数将 128 位整数的每个字节的所有 lsb 或 msb 投影到 16 位值(例如，在 X86 cpus 上有一个 SSE2 汇编指令: pmovmskb，Intel 和 MS 编译器支持 _mm_movemask_pi8 内在函数，gcc 也有内在函数:__builtin_ia32_ppmovmskb128, );
然后将该值分成 4 个半字节；
定义函数来处理半字节的每个可能值(从 0 到 15)并将它们放入一个数组中；
最后调用每个半字节索引的函数(有额外的参数来指示它是16位中的哪个半字节)。

关于c++ - 从 SIMD vector 中提取集合字节位置，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/28506500/

c++ - 从 SIMD vector 中提取集合字节位置

上一篇：c++ - 子类化 stringstream 给出 "0x401bad ABC"而不是 "Foo ABC"

下一篇：C++ 文本文件不会保存在 Unicode 中，它一直保存在 ANSI 中