assembly - __div_64_32 使用的算法

经过一些研究，我无法找出 __div_64_32 使用哪种除法算法Linux内核中的函数:

# arch/powerpc/boot/div64.S
#include "ppc_asm.h"

    .globl __div64_32
__div64_32:
    lwz r5,0(r3)    # get the dividend into r5/r6
    lwz r6,4(r3)
    cmplw   r5,r4
    li  r7,0
    li  r8,0
    blt 1f
    divwu   r7,r5,r4    # if dividend.hi >= divisor,
    mullw   r0,r7,r4    # quotient.hi = dividend.hi / divisor
    subf.   r5,r0,r5    # dividend.hi %= divisor
    beq 3f
1:  mr  r11,r5      # here dividend.hi != 0
    andis.  r0,r5,0xc000
    bne 2f
    cntlzw  r0,r5       # we are shifting the dividend right
    li  r10,-1      # to make it < 2^32, and shifting
    srw r10,r10,r0  # the divisor right the same amount,
    addc    r9,r4,r10   # rounding up (so the estimate cannot
    andc    r11,r6,r10  # ever be too large, only too small)
    andc    r9,r9,r10
    addze   r9,r9
    or  r11,r5,r11
    rotlw   r9,r9,r0
    rotlw   r11,r11,r0
    divwu   r11,r11,r9  # then we divide the shifted quantities
2:  mullw   r10,r11,r4  # to get an estimate of the quotient,
    mulhwu  r9,r11,r4   # multiply the estimate by the divisor,
    subfc   r6,r10,r6   # take the product from the divisor,
    add r8,r8,r11   # and add the estimate to the accumulated
    subfe.  r5,r9,r5    # quotient
    bne 1b
3:  cmplw   r6,r4
    blt 4f
    divwu   r0,r6,r4    # perform the remaining 32-bit division
    mullw   r10,r0,r4   # and get the remainder
    add r8,r8,r0
    subf    r6,r10,r6
4:  stw r7,0(r3)    # return the quotient in *r3
    stw r8,4(r3)
    mr  r3,r6       # return the remainder in r3
    blr

据我了解，除法分两步进行，但我对“商的估计”概念很困惑。

此外，考虑到函数代码，我不明白这个特定指令的作用:

andis.  r0,r5,0xc000 # why this 0xC000 value ?

有人对这段代码有更多解释吗？

最佳答案

主要功能是:

r5:r6 -> r7:r8 * r4 + r6

分解为:

0 Load:
    Load from memory
    Set the temp registers
    Make the first division (divwu  r7,r5,r4)

1 Divide Loop:
    Shift left 
    Take carre of carry
    Divide

2 Accumulate Loop:
    Accumulate
    If hight part != 0 goto 1

3 Last low division:
    Divide the low part the old way 32 bit / 32bit

4 Store result:

第一次除法(第 0 部分结束)后，被除数高位部分 (r5) 的余数低于除数 (r4)。我们不能再分割它了。我们必须改变它并记住所应用的改变。所以我们统计r5 cntlzw r0,r5中的前导0 。我们将股息转移这个金额，除以，转移回来。

使用的算法为 r5/r3 = ( r5 * 2^x/r3 )/2^x。困难在于进位、溢出...

你提出了一个很好的问题:为什么andis. r0,r5,0xc000 。来自文档:andis. rz,rv,i: rz16−31 = rv & (i << 16), rz32−63 = rz0−15 = 0 。来自Python:bin(0xc000 << 16) = 0b11000000000000000000000000000000 。所以它在这里是因为，如果设置了股息.hi 的第一个或第二个位，我们不应该移位。

好吧，第一个已设置，我们按零移动所以没用，我们只是浪费时间。如果设置了第二位...我的猜测是，在除法之后右移时我们可能会溢出一些东西。特别是因为我们在标签 1 的开头尽可能左移(使 x 尽可能大，从而减少循环次数)。

最后，他所说的估计器，我们(我)称之为累加器。他称它们为估算器，因为它们每次都会累积较小的数字。

雷恩波尔多致敬!

关于assembly - __div_64_32 使用的算法，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48110819/

assembly - __div_64_32 使用的算法

上一篇：google-bigquery - Google Bigquery-通过命令行创建表架构

下一篇：java - 如何强制执行 Java API