delphi - 将两个 UInt32 相乘以获得 UInt64 而不加宽

对于我的 BigIntegers，在 PUREPASCAL 实现中(即不允许汇编器)，我必须将两个 UInt32 相乘才能获得 UInt64 结果。

通常的方法是扩大至少一个操作数，这样你就可以得到 64 位乘法:

Res := UInt64(A) * B;

其中 Res 为 UInt64，A 和 B 为 UInt32 >.

但是，在 Win32 中，这会产生一段相当笨重的机器代码:

MulTest.dpr.431: Res := UInt64(A) * B;
004DB463 8B45F8           mov eax,[ebp-$08]  // load A 
004DB466 33D2             xor edx,edx        // make it UInt64
004DB468 52               push edx           // push A
004DB469 50               push eax
004DB46A 8B45FC           mov eax,[ebp-$04]  // load B
004DB46D 33D2             xor edx,edx        // make it UInt64 
004DB46F E87C0AF3FF       call @_llmul       // 64 bit multiplication
004DB474 8945E8           mov [ebp-$18],eax  // store 64 bit result
004DB477 8955EC           mov [ebp-$14],edx

现在，如果你这样做:

Res := A * B;

不幸的是，您得到了 32 位中间结果(实际结果的前 32 位被简单地清零):

MulTest.dpr.435: Res := A * B;
004DB4BD 8B45FC           mov eax,[ebp-$04]
004DB4C0 F76DF8           imul dword ptr [ebp-$08]
004DB4C3 33D2             xor edx,edx              // zero out top 32 bits
004DB4C5 8945E8           mov [ebp-$18],eax
004DB4C8 8955EC           mov [ebp-$14],edx

现在，如果 xor edx,edx 行不存在，结果将正是我需要的。这将比使用 UInt64 转换的扩展版本快两倍多(即花费不到一半的时间)。

问题:有谁知道是否有伪函数或技巧或强制转换不会丢弃 64 位结果的前 32 位？我知道如何在汇编程序中执行此操作，但这必须是 PUREPASCAL(它也应该在其他平台上工作)。

通过访问 32 位无符号整数数组(将 BigInteger 组成为无符号 16 位整数数组)并将其相加，我设法在 PUREPASCAL 中更快地进行 32 位加法。所以我也尝试使用 16 位中间结果进行乘法:

// Too slow: in a test, 2973 ms for Mul32(A, B) vs 1432 ms for UInt64(A) * B.
function MulU32ToU64(L, R: UInt32): UInt64; inline;
var
  L0R0, L0R1, L1R0, L1R1, Sum: UInt32;
type
  TUInt64 = packed record
    case Byte of
      0: (L0, L1, L2, L3: UInt16);
      1: (I0, I1: UInt32);
  end;
  TUInt32 = packed record
    Lo, Hi: Word;
  end;
begin
  L0R0 := TUInt32(L).Lo * TUInt32(R).Lo;
  L0R1 := TUInt32(L).Lo * TUInt32(R).Hi;
  L1R0 := TUInt32(L).Hi * TUInt32(R).Lo;
  L1R1 := TUInt32(L).Hi * TUInt32(R).Hi;
  TUInt64(Result).L0 := TUInt32(L0R0).Lo;
  Sum := UInt32(TUInt32(L0R0).Hi) + TUInt32(L1R0).Lo + TUInt32(L0R1).Lo;
  TUInt64(Result).L1 := TUInt32(Sum).Lo;
  Sum := UInt32(TUInt32(Sum).Hi) + TUInt32(L1R0).Hi + TUInt32(L0R1).Hi + L1R1;
  TUInt64(Result).I1 := Sum;
end;

它给出了正确的结果，但慢是 UInt64(A) * B 的两倍多。这并不奇怪，因为它执行 4 次 UInt32 乘法和大量加法，这使得它比使用 System.__llmul 的代码慢。

更新

正如 @J... 指出的，Delphi 通常使用 IMUL，它执行有符号乘法。所以乘以例如$00000002 和 $FFFFFFFF 导致 EAX = $FFFFFFFE 和 EDX = $FFFFFFFF(换句话说，Int64 值为 -2)，虽然我需要 EAX = $FFFFFFFE (相同)，但是 EDX = $00000001 (一起为 UInt64，其值为 $00000001FFFFFFFE)。因此，丢弃前 32 位是正确的，并且似乎没有办法强制 Delphi 使用 MUL 并保留其结果的前 32 位。

最佳答案

MulTest.dpr.435: Res := A * B;
004DB4BD 8B45FC           mov eax,[ebp-$04]
004DB4C0 F76DF8           imul dword ptr [ebp-$08]
004DB4C3 33D2             xor edx,edx              // zero out top 32 bits
004DB4C5 8945E8           mov [ebp-$18],eax
004DB4C8 8955EC           mov [ebp-$14],edx

Now, if the line xor edx,edx were not there, the result would be exactly what I need.

不，这根本不是你想要的。这是一个有符号乘法，如果您想要无符号结果，则结果是无意义的。使 A:=$FFFFFFFF 和 B:=2 - imul 的结果是 EAX = FFFFFFFE 和 EDX = FFFFFFFF。即使有两个无符号操作数也会发出此操作码。您需要 mul 指令，而不是 imul。我不认为delphi编译器会从纯pascal中发出mul。来自 the documentation on * (强调我的)

The value of x / y is of type Extended, regardless of the types of x and y. For other arithmetic operators, the result is of type Extended whenever at least one operand is a real; otherwise, the result is of type Int64 when at least one operand is of type Int64; otherwise, the result is of type Integer.

整数 - 有符号。考虑到这对体系结构特性的依赖程度，以及 delphi 编译器的缺陷，我认为这里唯一的高性能解决方案将是依赖于目标的汇编。

function UMul3264(x, y : UInt32) : UInt64;
asm
  mul eax, edx
end;

关于delphi - 将两个 UInt32 相乘以获得 UInt64 而不加宽，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45889835/

delphi - 将两个 UInt32 相乘以获得 UInt64 而不加宽

更新

上一篇：德尔福和Doxygen

下一篇：delphi - 使用Windows API绘制Button时，我们需要负责绘制标题吗？