c++ - float 学执行时间

第一个数据集执行时间增加的原因是什么？组装说明是一样的。

在 DN_FLUSH 标志未打开的情况下，第一个数据集需要 63 毫秒，第二个数据集需要 15 毫秒。
启用 DN_FLUSH 标志后，第一个数据集需要 15 毫秒，第二个数据集需要大约 0 毫秒。

因此，在这两种情况下，第一个数据集的执行时间要长得多。

有什么方法可以减少执行时间以更接近第二个数据集？

我正在使用 C++ Visual Studio 2005，/arch:SSE2/fp:fast 在 Intel Core 2 Duo T7700 @ 2.4Ghz Windows XP Pro 上运行。

#define NUMLOOPS 1000000

// Denormal values flushed to zero by hardware on ALPHA and x86
// processors with SSE2 support. Ignored on other x86 platforms
// Setting this decreases execution time from 63 milliseconds to 16 millisecond
// _controlfp(_DN_FLUSH, _MCW_DN);

float denormal = 1.0e-38;
float denormalTwo = 1.0e-39;
float denormalThree = 1;

tickStart = GetTickCount();

// Run First Calculation Loop 
for (loops=0; loops < NUMLOOPS; loops++)
{
    denormalThree = denormal - denormalTwo;
}

// Get execution time
duration = GetTickCount()-tickStart;
printf("Duration = %dms\n", duration);

float normal = 1.0e-10;
float normalTwo = 1.0e-2;
float normalThree = 1;

tickStart = GetTickCount();

// Run Second Calculation Loop 
for (loops=0; loops < NUMLOOPS; loops++)
{
    normalThree = normal - normalTwo;
}

// Get execution time
duration = GetTickCount()-tickStart;
printf("Duration = %dms\n", duration);

最佳答案

引自Intel的优化手册:

When an input operand for a SIMD floating-point instruction [here this includes scalar arithmetic done using SSE] contains values that are less than the representable range of the data type, a denormal exception occurs. This causes a significant performance penalty. An SIMD floating-point operation has a flush-to-zero mode in which the results will not underflow. Therefore subsequent computation will not face the performance penalty of handling denormal input operands.

至于如何避免这种情况，如果您不能清除非规范化:尽您所能确保您的数据适当缩放，并且您首先不会遇到非规范化。通常这意味着延迟应用一些比例因子，直到您完成所有其他计算。

或者，在 double 中进行计算，它具有更大的指数范围，因此您一开始就不太可能遇到非正规化。

关于c++ - float 学执行时间，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/2051534/

c++ - float 学执行时间

上一篇：c++ - 什么时候显式调用 C++ 析构函数？

下一篇：c# - 管理托管 (C#) 和非托管 (C++) 对象的析构函数