floating-point - 在仅支持 32 位 float 的平台上,如何将 IEEE754 64 位 double 除以 1000?

标签 floating-point precision ieee-754 integer-overflow floating-point-conversion

我有一个电表通过 PROFIBUS 连接到 DCS(分布式控制系统)。该仪表 (Siemens Sentron PAC3200) 提供 IEEE 754 双倍 Wh(瓦特小时)计数。此外,计数器在 1.0e12 Wh 或 1,000 GWh 时溢出。 (剖面图:几年前,西门子开发实验室。“让我们看看,如何传输 40 位无符号整数值?让我们使用 double!”)

我的目标是以 kWh 精度一致地记录计数。

但是 DCS 仅支持单精度 float 。因此,如果我采取直接路线,即将数据压缩为 float ,那么 kWh 读数中将出现大约小数点后七位的错误,即最晚从大约 100,000,000 Wh 或 100 MWh 开始。目前的数量已经是 600 MWh,所以这不是可行的方法。

所以现在,我将尾数放入无符号双整数(UDINT,在此平台上为 32 位)并根据 IEEE 754 执行转换,这会产生正确的 Wh 值。然而,这需要 2^32 Wh 或约 4.3 GWh 的溢出,这只能维持我们不到十年。

由于我只需要 kWh 精度,因此我在转换初期就有了除以 1000 的想法。这将使变量溢出为 4,300 GWh,并且电表的内部计数器已溢出为 1,000 GWh。理论上,问题已解决。

由于 IEEE 754 是一种二进制浮点格式,所以我只能轻松除以 1024(右移 10 次),这会引入很大的错误。此后乘以 1.024 的校正因子只会在该平台上以单精度发生,从而使之前的努力无效。

另一种选择是从转换中输出“高”和“低”UDINT(以 Wh 为单位),然后我至少可以在理论上计算回 kWh,但这看起来很尴尬(而且-ful)。

我有一种微妙的感觉,我可能忽略了一些事情(可以这么说,单人群体思维);我对如何获得所传输的 double 值的 1/1000 的任何其他想法持开放态度。

谢谢并致以诚挚的问候

比约恩

P.S.:为了您的观看乐趣,这是基于 @EricPostpischil 的答案的解决方案 - 根据平台和任务的具体情况量身定制。使用的语言是符合 EN 61131-3 的 SCL(结构化控制语言),这是一种 Pascal 方言。

FUNCTION_BLOCK PAC3200KON_P

VAR_INPUT
    INH : DWORD;
    INL : DWORD;
END_VAR

VAR_OUTPUT
    OUT : UDINT;
    SGN : BOOL;
END_VAR

VAR
    significand:              UDINT;
    exponent, i, shift:       INT;
    sign:                     BOOL;
    d0, d1, y0, y1, r1, temp: DWORD;
END_VAR
(*
    Convert the energy count delivered by Siemens Sentron PAC3200
    (IEEE 754 binary64 format, a.k.a. double) into an UDINT.

    Peculiarities:
    - This hardware platform only supports binary32 (a.k.a. float).

    - The Sentron's internal counter overflows at 1.0e12 Wh (1000 GWh).

    - kWh resolution suffices.

    - If you converted the double directly to UDINT and divided by 1000
      afterwards, the range would be reduced to (2^32-1)/1000 GWh or about
      4.295 GWh.

    - This is why this function first divides the significand by 1000
      and then proceeds with conversion to UDINT. This expands the
      range to (2^32-1) GWh or about 4295 GWh, which isn't reachable in
      practice since the device's internal counter overflows before.

    Background:

    IEEE 754 binary64 bit assignment:

               High-Byte                         Low-Byte
    66665555555555444444444433333333 3322222222221111111111
    32109876543210987654321098765432 10987654321098765432109876543210
    GEEEEEEEEEEESSSSSSSSSSSSSSSSSSSS SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS

    G: sign (1: negative)
    E: exponent (biased; subtract 1023) (11 bits)
    S: significand (52 bits)
*)

(*
    significand: Bits 19...0 of high byte und complete low byte

    The significand is initially divided by 1000 using integer division. The
    bits are divided into two parts:

    - d1 contains the 31 most significant bits (plus leading 1)
    - d0 contains the next less significant bits

    In total, we use 48 bits of the original significand.
*)

(* d1: insert significand bits from high byte *)
d1 := INH AND     2#0000_0000_0000_1111_1111_1111_1111_1111;
(* result:        2#0000_0000_0000_HHHH_HHHH_HHHH_HHHH_HHHH *)

(* add the 1 before the binary point *)
d1 := d1 OR       2#0000_0000_0001_0000_0000_0000_0000_0000;
(* result:        2#0000_0000_0001_HHHH_HHHH_HHHH_HHHH_HHHH *)

(* "flush left" shift 11 places *)
d1 := d1 * 2048;
(* result:        2#1HHH_HHHH_HHHH_HHHH_HHHH_H000_0000_0000 *)

(* Insert another 11 bits from low byte (msb ones) *)
d1 := d1 OR (INL / 2097152);
(* result:        2#1HHH_HHHH_HHHH_HHHH_HHHH_HLLL_LLLL_LLLL *)

(* Base-65536 division. Integer divide by 1000 and save remainder *)
y1 := d1 / 1000;
r1 := TO_DW(TO_UD(d1) MOD 1000);

(*
   The significand now has leading zeroes. Shift left to make space
   at the other end.
*)
FOR shift := 1 TO 31 BY 1 DO
    y1 := y1 * 2;
    IF (y1 AND 2#1000_0000_0000_0000_0000_0000_0000_0000) <> 0 THEN
        EXIT;
    END_IF;
END_FOR;

(*
   d0: insert next 16 bits from the low byte
   (right shift five times and zero out the leading places)
*)
(* bits:             2#xxxx_xxxx_xxxL_LLLL_LLLL_LLLL_LLLx_xxxx *)
d0 := (INL / 32) AND 2#0000_0000_0000_0000_1111_1111_1111_1111;
(* result:           2#0000_0000_0000_0000_LLLL_LLLL_LLLL_LLLL *)

(* Now divide by 1000, factoring in remainder from before *)
y0 := ((r1 * 65536) OR d0) / 1000;

(*
   y1 and y0 contain results from division by 1000. We'll now build a 32 bit
   significand from these.

   y1 = 2#1HHH_HHHH_HHHH_HHHH_HHHH_HHxx_xxxx_xxxx
   y0 = 2#0000_0000_0000_0000_LLLL_LLLL_LLLL_LLLL

   y1 has an uncertain number of zeroes at its end, resulting from the above
   left shifting (number of steps inside variable "shift"). Fill those with the
   most significant bits from y0.

   y0 has 16 valid bits (0..15). Shift right so that the "highest place zero"
   in y1 corresponds with the MSB from y0. (shift by 16-shift)

   y1 = 2#1HHH_HHHH_HHHH_HHHH_HHHH_HHxx_xxxx_xxxx (ex.: shift=10)
   y0 = 2#0000_0000_0000_0000_0000_00LL_LLLL_LLLL
                              ------>^
*)

FOR i := 1 TO 16 - shift BY 1 DO
    y0 := y0 / 2;
END_FOR;

significand := TO_UD(y1 OR y0);
(* Result: 32-bit significand *)

(*
    Exponent: bits (62-32)...(59-32) or bits 30...20 of high byte, respectively

    Coded with bias of 1023 (needs to be subtracted).

    Special cases as per standard:
    - 16#000: signed zero or underflow (map to zero)
    - 16#7FF: inifinite or NaN (map to overflow)
*)
temp := 2#0111_1111_1111_0000_0000_0000_0000_0000 AND INH;
temp := temp / 1048576 ; (* right shift 20 places (2^20) *)
exponent := TO_IN(TO_DI(temp));
exponent := exponent - 1023; (* remove bias *)

(*
   Above, we already left shifted "shift" times, which needs to be taken into
   account here by shifting less.
*)
exponent := exponent - shift;

(*
    The significand will be output as UDINT, but was initially a binary64 with
    binary point behind the leading 1, after which the coded exponent must be
    "executed".

    temp = 2#1.HHH_HHHH_HHHH_HHHH_HHHH_HLLL_LLLL_LLLL

    As UDINT, this already corresponds to a 31-fold left shift.

    Exponent cases as per IEEE 754:

    - exponent < 0:            result < 1
    - exponent = 0:       1 <= result < 2
    - exponent = x > 0: 2^x <= result < 2^(x+1)

    The UDINT output (32 bit) allows us to represent exponents right up to 31.
    Everything above is mapped to UDINT's maximum value.

    Now determine, after the de facto 31-fold left shift, what shifts remain
    "to do".
*)

IF exponent < 0 THEN
    (* underflow: < 2^0 *)
    significand := 0;
ELSIF exponent > 31 THEN
    (* overflow: > 2^32 - 1 *)
    significand := 4294967295;
ELSE
    (*
        result is significand * 2^exponent or here, as mentioned above,
        significand * 2^(31-exponent).

        The loop index i is the "shift target" after loop execution, which is
        why it starts at 31-1.

        Example: exponent = 27, but de facto we've already got a shift of 31.
        So we'll shift back four times to place the binary point at the right
        position (30, 29, 28, 27):

        before: temp = 2#1HHH_HHHH_HHHH_HHHH_HHHH_HLLL_LLLL_LLLL.

        after:  temp = 2#1HHH_HHHH_HHHH_HHHH_HHHH_HLLL_LLLL.LLLL
                                                           ^<---|
    *)
    FOR i := 30 TO exponent BY -1 DO
        significand := significand / 2;
    END_FOR;
END_IF;

(*
    sign: bit 63 of high byte
*)
sign := (2#1000_0000_0000_0000_0000_0000_0000_0000 AND INH) <> 0;

OUT := significand;
SGN := sign;

END_FUNCTION_BLOCK

我使用的测试数据:

  high byte     low byte  decimal value
=======================================
16#41c558c3, 16#2d3f331e,       716_277
16#41EFFFFF, 16#5E000000,     4_294_966
16#41EFFFFF, 16#DB000000,     4_294_967
16#41F00000, 16#2C000000,     4_294_968
16#426D1A94, 16#A1830000,   999_999_999
16#426D1A94, 16#A2000000, 1_000_000_000
16#426D1A94, 16#A27D0000, 1_000_000_001
16#428F3FFF, 16#FFC18000, 4_294_967_294
16#428F3FFF, 16#FFE0C000, 4_294_967_295
16#428F4000, 16#00000000, 4_294_967_296

顺便说一句,SCL 中 b#1234 形式的整数文字基本上表示“以 b 为基数的数字 1234”。下划线将被忽略(它们是数字分隔符,以提高可读性,例如 Python 就有它们)。

最佳答案

/*  This program shows two methods of dividing an integer exceeding 32 bits
    by 1000 using unsigned 32-bit integer arithmetic.
*/


#include <inttypes.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>


/*  If the count is less than 2**35, we can shift three bits (divide by 8) and
    then divide by 125 using 32-bit unsigned arithmetic.
*/
static uint32_t ShiftThenDivide(uint64_t x)
{
    uint32_t y = x >> 3;
    return y / 125;
}


/*  Given any count less than 1000*2**32 (which exceeds the 2**40 requirement),
    we can perform long division in radix 65536.
*/
static uint64_t LongDivision(uint64_t x)
{
    /*  Set d1 to the high two base-65536 digits (bits 17 to 31) and d0 to
        the low digit (bits 0 to 15).
    */
    uint32_t d1 = x >> 16, d0 = x & 0xffffu;

    //  Get the quotient and remainder of dividing d1 by 1000.
    uint32_t y1 = d1 / 1000, r1 = d1 % 1000;

    /*  Combine the previous remainder with the low digit of the dividend and
        divide by 1000.
    */
    uint32_t y0 = (r1<<16 | d0) / 1000;

    //  Return a quotient formed from the two quotient digits.
    return y1 << 16 | y0;
}


static void Test(uint64_t x)
{
    //  Use 64-bit arithmetic to get a reference result.
    uint32_t y0 = x / 1000;

    //  ShiftThenDivide only works up to 2**35, so only test up to that.
    if (x < UINT64_C(1) << 35)
    {
        uint32_t y1 = ShiftThenDivide(x);
        if (y1 != y0)
        {
            printf("Error, 0x%" PRIx64 " / 1000 = 0x%" PRIx32 ", but ShiftThenDivide produces 0x%" PRIx32 ".\n",
                x, y0, y1);
            exit(EXIT_FAILURE);
        }
    }

    //  Test LongDivision.
    uint32_t y2 = LongDivision(x);
    if (y2 != y0)
    {
        printf("Error, 0x%" PRIx64 " / 1000 = 0x%" PRIx32 ", but LongDivision produces 0x%" PRIx32 ".\n",
            x, y0, y2);
        exit(EXIT_FAILURE);
    }
}


int main(void)
{
    srandom(time(0));

    //  Test all possible values for the upper eight bits.
    for (uint64_t upper = 0; upper < 1<<8; ++upper)
    {
        //  Test some edge cases.
        uint64_t x = upper << 32;
        Test(x);
        Test(x+1);
        Test(x-1 & 0xffffffffffu);
            /*  When x is zero, x-1 would wrap modulo 2**64, but that is
                outside our supported domain, so wrap modulo 2**40.
            */

        //  Test an assortment of low 32 bits.
        for (int i = 0; i < 1000; ++i)
        {
            uint32_t r0 = random() & 0xffffu, r1 = random() & 0xffffu;
            uint64_t lower = r1 << 16 | r0;
            Test(x | lower);
        }
    }
}

关于floating-point - 在仅支持 32 位 float 的平台上,如何将 IEEE754 64 位 double 除以 1000?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57724239/

相关文章:

c - K&R - 了解 float - 基数、尾数和基数

python - 如何在字符串中找到 float - Python?

C++货币输出

c++ - 从 int 到 float 并返回时符号发生变化

c - Linux gcc (5.4.0) 中的 float 是否遵守 IEEE754 规则?

java - 线程中的异常 "AWT-EventQueue-0"java.lang.NumberFormatException : empty String

c# - 设置精度c#

math - float 学有问题吗?

c++ - 如何比较两个包含十进制值的字符串?

c++ - float128 和 double-double 运算