c++ - 对签名数据进行逻辑右移

标签 c++ c x86 bit-manipulation bit-shift

首先,不,这不是我的家庭作业,这是一本名为 Computer Systems A Programmer's Perspective 的书给出的实验。 (顺便说一句,很棒的书)

我需要对有符号整数执行逻辑右移而不使用以下任何内容:

  • 选角
  • if , while , switch , for , do - while , ? :
  • 任何类型的指针

允许的运算符是: ! + ~ | >> << ^

到目前为止我尝试了什么?

/* 
 * logicalShift - shift x to the right by n, using a logical shift
 *   Can assume that 0 <= n <= 31
 *   Examples: logicalShift(0x87654321,4) = 0x08765432
 *   Legal ops: ~ & ^ | + << >>
 *   Max ops: 20
 *   Rating: 3 
 */
int logicalShift(int x, int n) {
    int mask = ~0;
    int shiftAmount = 31 + ((~n)+1);//this evaluates to 31 - n on two's complement machines
    mask = mask << shiftAmount;
    mask = ~mask;//If n equals 0, it means we have negated all bits and hence have mask = 0
    x = x >> n;
    return x & mask; 
}

只要 n 就可以正常工作不等于 0 ,当它这样做时,此代码会中断,因为掩码将变为全 0,并且该函数将返回 0 .

如果有任何正确方向的提示而不是完整的代码,我将不胜感激。

再说一遍,这不是作业;实验室作业可在此处公开获得http://csapp.cs.cmu.edu/public/labs.html

PS:不是重复的,不要发布涉及转换为 unsigned 然后转移的解决方案。

最佳答案

这道题是搜索C++逻辑移位的第一个结果。

因此,回答一般情况也是有意义的,其中允许cast - 因为此处显示的代码均未编译(GCC 9.2,-O3)使用正确(且快速)的操作码(仅一个 shr 指令而不是 sar )。

类型转换

此版本也适用于即将推出的 int128_t (目前在 GCC、Clang 和 ICC 中为 __int128)和其他 future 类型。如果您被允许使用 type_traits并且您的代码将来应该可以正常工作,而无需考虑什么是正确的无符号类型,您应该将其用于正确的转换。

代码

#include <type_traits>

template<class T>
inline T logicalShift(T t1, T t2) {
  return 
    static_cast<
      typename std::make_unsigned<T>::type
    >(t1) >> t2;
}

汇编代码分析

将其打包成 f(T x, T y) { return logicalShift(x, y); }导致以下汇编器指令(GCC9.2,-O3):

f(int, int):
        mov     eax, edi
        mov     ecx, esi
        shr     eax, cl
        ret
f(unsigned int, unsigned int):
        mov     eax, edi
        mov     ecx, esi
        shr     eax, cl
        ret
f(__int128, __int128):
        mov     rcx, rdx
        mov     rax, rdi
        mov     rdx, rsi
        shrd    rax, rsi, cl
        shr     rdx, cl
        xor     esi, esi
        and     ecx, 64
        cmovne  rax, rdx
        cmovne  rdx, rsi
        ret
f(unsigned __int128, unsigned __int128):
        mov     rcx, rdx
        mov     rax, rdi
        mov     rdx, rsi
        shrd    rax, rsi, cl
        shr     rdx, cl
        xor     esi, esi
        and     ecx, 64
        cmovne  rax, rdx
        cmovne  rdx, rsi
        ret

同时 T a, b; T c = a >> b;结果:

f(int, int):
        mov     eax, edi      # 32-bit registers
        mov     ecx, esi
        sar     eax, cl       # lower 8-bit of cx register
        ret
f(long, long):
        mov     rax, rdi      # 64-bit registers
        mov     ecx, esi
        sar     rax, cl
        ret
f(unsigned int, unsigned int):
        mov     eax, edi
        mov     ecx, esi
        shr     eax, cl
        ret
f(__int128, __int128):
        mov     rcx, rdx
        mov     rdx, rsi
        mov     rax, rdi
        sar     rdx, cl
        shrd    rax, rsi, cl
        mov     rsi, rdx
        sar     rsi, 63
        and     ecx, 64
        cmovne  rax, rdx
        cmovne  rdx, rsi
        ret

我们看到,区别主要只有shr而不是 sar (还有一些用于 __int128)。什么代码可以更快?


非 Actor 版本

(精简指令集为 ~ & ^ | + << >>)

问题 - 左移(SALSHL)

@Fingolfin 的初衷还是不错的。但是我们的处理器不会按照我们对 int mask = ~0 << n 的第一想法去做。对于 n >= 32; ,但为什么呢?

C++ 标准(草案 N4713、8.5.7、第 2 版)针对 <<:

The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1 × 2^E2, reduced modulo one more than the maximum value representable in the result type. Otherwise, if E1 has a signed type and non-negative value, and E1 × 2^E2 is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value; otherwise, the behavior is undefined.

听起来像(E1 << E2) % (UINTxx_MAX + 1) ,我们只是从右边用 0 填充并用模运算切断前导位。简单明了。

分析左移

16 位 short、32 位 int 和 64 位 long 的汇编代码(GCC 9.2,-O3)是:

g(short, short):
        movsx   eax, di    # 16-bit to 32-bit register
        mov     ecx, esi
        sal     eax, cl    # 1st is 32-bit, 2nd is 8-bit register
        ret
g(int, int):
        mov     eax, edi   # 32-bit registers
        mov     ecx, esi
        sal     eax, cl    # 1st is 32-bit, 2nd is 8-bit register
        ret
g(long, long):
        mov     rax, rdi   # 64-bit registers
        mov     ecx, esi
        sal     rax, cl    # 1st is 64-bit, 2nd is 8-bit register
        ret

因此,我们讨论了我们从 ~0 << i 断言的内容对于 int i = 0; i <= 33; i++ ,但我们真正得到了什么?

 0: 11111111111111111111111111111111
 1: 11111111111111111111111111111110
 2: 11111111111111111111111111111100
 3: 11111111111111111111111111111000
 4: 11111111111111111111111111110000
 5: 11111111111111111111111111100000
 6: 11111111111111111111111111000000
 7: 11111111111111111111111110000000
 8: 11111111111111111111111100000000
 9: 11111111111111111111111000000000
10: 11111111111111111111110000000000
11: 11111111111111111111100000000000
12: 11111111111111111111000000000000
13: 11111111111111111110000000000000
14: 11111111111111111100000000000000
15: 11111111111111111000000000000000
16: 11111111111111110000000000000000
17: 11111111111111100000000000000000
18: 11111111111111000000000000000000
19: 11111111111110000000000000000000
20: 11111111111100000000000000000000
21: 11111111111000000000000000000000
22: 11111111110000000000000000000000
23: 11111111100000000000000000000000
24: 11111111000000000000000000000000
25: 11111110000000000000000000000000
26: 11111100000000000000000000000000
27: 11111000000000000000000000000000
28: 11110000000000000000000000000000
29: 11100000000000000000000000000000
30: 11000000000000000000000000000000
31: 10000000000000000000000000000000
32: 11111111111111111111111111111111
33: 11111111111111111111111111111110

我们看到,结果更像是~0 << (i%2^5) .

所以,看看长(又名长 int64_t)案例:(使用 MSVC 为 x86 编译)

 0: 1111111111111111111111111111111111111111111111111111111111111111
 1: 1111111111111111111111111111111111111111111111111111111111111110
 2: 1111111111111111111111111111111111111111111111111111111111111100
 3: 1111111111111111111111111111111111111111111111111111111111111000
 4: 1111111111111111111111111111111111111111111111111111111111110000
 5: 1111111111111111111111111111111111111111111111111111111111100000
 6: 1111111111111111111111111111111111111111111111111111111111000000
 7: 1111111111111111111111111111111111111111111111111111111110000000
 8: 1111111111111111111111111111111111111111111111111111111100000000
 9: 1111111111111111111111111111111111111111111111111111111000000000
10: 1111111111111111111111111111111111111111111111111111110000000000
11: 1111111111111111111111111111111111111111111111111111100000000000
12: 1111111111111111111111111111111111111111111111111111000000000000
13: 1111111111111111111111111111111111111111111111111110000000000000
14: 1111111111111111111111111111111111111111111111111100000000000000
15: 1111111111111111111111111111111111111111111111111000000000000000
16: 1111111111111111111111111111111111111111111111110000000000000000
17: 1111111111111111111111111111111111111111111111100000000000000000
18: 1111111111111111111111111111111111111111111111000000000000000000
19: 1111111111111111111111111111111111111111111110000000000000000000
20: 1111111111111111111111111111111111111111111100000000000000000000
21: 1111111111111111111111111111111111111111111000000000000000000000
22: 1111111111111111111111111111111111111111110000000000000000000000
23: 1111111111111111111111111111111111111111100000000000000000000000
24: 1111111111111111111111111111111111111111000000000000000000000000
25: 1111111111111111111111111111111111111110000000000000000000000000
26: 1111111111111111111111111111111111111100000000000000000000000000
27: 1111111111111111111111111111111111111000000000000000000000000000
28: 1111111111111111111111111111111111110000000000000000000000000000
29: 1111111111111111111111111111111111100000000000000000000000000000
30: 1111111111111111111111111111111111000000000000000000000000000000
31: 1111111111111111111111111111111110000000000000000000000000000000
32: 1111111111111111111111111111111100000000000000000000000000000000
33: 1111111111111111111111111111111000000000000000000000000000000000
34: 1111111111111111111111111111110000000000000000000000000000000000
35: 1111111111111111111111111111100000000000000000000000000000000000
36: 1111111111111111111111111111000000000000000000000000000000000000
37: 1111111111111111111111111110000000000000000000000000000000000000
38: 1111111111111111111111111100000000000000000000000000000000000000
39: 1111111111111111111111111000000000000000000000000000000000000000
40: 1111111111111111111111110000000000000000000000000000000000000000
41: 1111111111111111111111100000000000000000000000000000000000000000
42: 1111111111111111111111000000000000000000000000000000000000000000
43: 1111111111111111111110000000000000000000000000000000000000000000
44: 1111111111111111111100000000000000000000000000000000000000000000
45: 1111111111111111111000000000000000000000000000000000000000000000
46: 1111111111111111110000000000000000000000000000000000000000000000
47: 1111111111111111100000000000000000000000000000000000000000000000
48: 1111111111111111000000000000000000000000000000000000000000000000
49: 1111111111111110000000000000000000000000000000000000000000000000
50: 1111111111111100000000000000000000000000000000000000000000000000
51: 1111111111111000000000000000000000000000000000000000000000000000
52: 1111111111110000000000000000000000000000000000000000000000000000
53: 1111111111100000000000000000000000000000000000000000000000000000
54: 1111111111000000000000000000000000000000000000000000000000000000
55: 1111111110000000000000000000000000000000000000000000000000000000
56: 1111111100000000000000000000000000000000000000000000000000000000
57: 1111111000000000000000000000000000000000000000000000000000000000
58: 1111110000000000000000000000000000000000000000000000000000000000
59: 1111100000000000000000000000000000000000000000000000000000000000
60: 1111000000000000000000000000000000000000000000000000000000000000
61: 1110000000000000000000000000000000000000000000000000000000000000
62: 1100000000000000000000000000000000000000000000000000000000000000
63: 1000000000000000000000000000000000000000000000000000000000000000
64: 0000000000000000000000000000000000000000000000000000000000000000
65: 0000000000000000000000000000000000000000000000000000000000000000

轰!

(同样直到 31 也适用于 GCC 的简称,因为它使用 32 位 EAX 注册 sal)

但是,此结果仅由编译器创建:

x86 msvc v19.22,/O2:
_x$ = 8                                       ; size = 8
_y$ = 16                                                ; size = 8
__int64 g(__int64,__int64) PROC                                  ; g, COMDAT
        mov     eax, DWORD PTR _x$[esp-4]
        mov     edx, DWORD PTR _x$[esp]
        mov     ecx, DWORD PTR _y$[esp-4]
        jmp     __allshl
__int64 g(__int64,__int64) ENDP  
x64 msvc v19.22,/O2:
x$ = 8
y$ = 16
__int64 g(__int64,__int64) PROC                                  ; g, COMDAT
        mov     rax, rcx
        mov     rcx, rdx
        shl     rax, cl
        ret     0
__int64 g(__int64,__int64) ENDP      

并且 x64 MSVC 代码显示与 GCC 9.2 代码相同的行为 - 使用 shl而不是 sal .

从那时起,我们现在知道处理器本身(第 6 代英特尔酷睿)仅使用 cl 的最后一位数字。寄存器,取决于移位操作的第一个寄存器的长度,即使 C++ 标准另有说明。

一个小修复

所以,这就是代码中断的地方。本能地我会采取 32 - n 的 shiftAmount|并进入上面的问题,你已经通过使用 shiftAmount 避免了这个问题的 31 - n .知道 n 是 0..31,就没有 shiftAmount 为 32。很好。

但是,一方面减少意味着另一方面增加。 mask现在需要从 -2 开始(我们不转移 0b1111 ,我们转移 0b1110 ):

int logSh3(int x, int n) {
    int mask = ~2 + 1;
    int shiftAmount = 31 + ((~n) + 1);//this evaluates to 31 - n on two's complement machines
    mask = mask << shiftAmount;
    mask = ~mask;//If n equals 0, it means we have negated all bits and hence have mask = 0
    x = x >> n;
    return x & mask;
}

而且有效!

替代代码与分析

Fingolfins 代码(固定)

上层代码,作为汇编程序(GCC 9.2,-O3):

logSh3(int, int):
        mov     ecx, 31
        mov     edx, -2
        mov     eax, edi
        sub     ecx, esi
        sal     edx, cl
        mov     ecx, esi
        not     edx
        sar     eax, cl
        and     eax, edx
        ret

9条指令

哈罗德代码

int logSh2(int x, int n) {
    int shiftAmount = 31 + ((~n) + 1);//this evaluates to 31 - n on two's complement machines
    int mask = 1 << shiftAmount;
    mask |= mask + ((~1) + 1);
    x = x >> n;
    return x & mask;
}
logSh2(int, int):
        mov     ecx, esi
        mov     r8d, edi
        mov     edi, -2147483648
        shr     edi, cl
        sar     r8d, cl
        lea     eax, [rdi-1]
        or      eax, edi
        and     eax, r8d
        ret

8条指令

我们可以做得更好吗?

另一种解决方案

我们可以右移 0b1000 而不是左移, 将其向后移一并取反。

int logSh4(int x, int n) {
    int mask = 0x80000000;
    mask = mask >> n;
    mask = mask << 1;
    mask = ~mask;//If n equals 0, it means we have negated all bits and hence have mask = 0
    x = x >> n;
    return x & mask;
}
logSh4(int, int):
        mov     ecx, esi
        mov     edx, -2147483648
        sar     edx, cl
        sar     edi, cl
        lea     eax, [rdx+rdx]
        not     eax
        and     eax, edi
        ret

7 说明

更好的方法?

让我们转移0b0111向右,将它向左移一位并加 1。所以我们省去逆运算:

int logSh5(int x, int n) {
    int mask = 0x7fffffff;
    mask = mask >> n;
    mask = (mask << 1) | 1;
    x = x >> n;
    return x & mask;
}
logSh5(int, int):
        mov     ecx, esi
        mov     eax, 2147483647
        sar     eax, cl
        sar     edi, cl
        lea     eax, [rax+1+rax]
        and     eax, edi
        ret

还剩 6 条指令。美好的。 (但简单的转换仍然是实践中最好的解决方案)

关于c++ - 对签名数据进行逻辑右移,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13221369/

相关文章:

c - c中的不可变字符串的内存分配在哪里?

C 程序 - 如何初始化长数组

C 气泡问题

c++ - 为什么 std::fill(0) 比 std::fill(1) 慢?

assembly - 为什么间接寻址中的索引器必须是双字?

c++ - MFC - 如何将 WCHAR 转换为 CString?

c++ - 链接器错误 : _main already defined in *. obj

c++ - 删除 std::tuple 的第一种类型

architecture - 指令指针是程序可见的寄存器吗?

c++ - 如何在 C++ 中将 uint8_t 和 uint16_t 转换为 float ?