c++ - 如果链接时启用 "-ffast-math"会发生什么?

标签 c++ gcc linker clang

我在 Ubuntu 中同时使用 gcc10 和 clang12。我刚刚发现,如果我启用 -ffast-math flag,在我的 C++ 项目中,性能提升大约 4 倍。
但是,如果我只启用 -ffast-math在编译时而不是在链接时,不会有性能改进。使用 -ffast-math 是什么意思链接时,是否会链接到任何特殊的 ffast-math系统中的库?
P.S:这个性能提升其实是让性能正常了。我曾经问过 question关于 AVX 指令在 Intel 处理器上的性能不佳。现在只要用-ffast-math就可以让性能正常了在 Linux 上编译和链接程序的标志,但即使我使用 clang 和 -ffast-math在 windows 上,性能仍然很差。所以我想知道我是否链接到Linux下的任何特殊系统库。

最佳答案

However, if I only enable -ffast-math at compile time and not at link time, there will be no performance improvement. What does it mean to use -ffast-math when linking, and will it link to any special ffast-math libraries in the system?


原来gcc链接在 crtfastmath.o-ffast-math为链接器指定(未记录的功能)。
对于 x86https://github.com/gcc-mirror/gcc/blob/master/libgcc/config/i386/crtfastmath.c#L83 ,它设置以下 CPU 选项:
#define MXCSR_DAZ (1 << 6)  /* Enable denormals are zero mode */
#define MXCSR_FTZ (1 << 15) /* Enable flush to zero mode */
非规范化浮点数处理起来要慢得多,因此在 CPU 中禁用它们会使浮点计算速度更快。
来自 Intel 64 and IA-32 Architectures Optimization Reference Manual:

6.5.3 Flush-to-Zero and Denormals-are-Zero Modes

The flush-to-zero (FTZ) and denormals-are-zero (DAZ) modes are not compatible with the IEEE Standard 754. They are provided to improve performance for applications where underflow is common and where the generation of a denormalized result is not necessary.

3.8.3.3 Floating-point Exceptions in SSE/SSE2/SSE3 Code

Most special situations that involve masked floating-point exceptions are handled efficiently in hardware. When a masked overflow exception occurs while executing SSE/SSE2/SSE3 code, processor hardware can handles it without performance penalty.

Underflow exceptions and denormalized source operands are usually treated according to the IEEE 754 specification, but this can incur significant performance delay. If a programmer is willing to trade pure IEEE 754 compliance for speed, two non-IEEE 754 compliant modes are provided to speed situations where underflows and input are frequent: FTZ mode and DAZ mode.

When the FTZ mode is enabled, an underflow result is automatically converted to a zero with the correct sign. Although this behavior is not compliant with IEEE 754, it is provided for use in applications where performance is more important than IEEE 754 compliance. Since denormal results are not produced when the FTZ mode is enabled, the only denormal floating-point numbers that can be encountered in FTZ mode are the ones specified as constants (read only).

The DAZ mode is provided to handle denormal source operands efficiently when running a SIMD floating-point application. When the DAZ mode is enabled, input denormals are treated as zeros with the same sign. Enabling the DAZ mode is the way to deal with denormal floating-point constants when perfor mance is the objective.

If departing from the IEEE 754 specification is acceptable and performance is critical, run SSE/SSE2/SSE3 applications with FTZ and DAZ modes enabled.

关于c++ - 如果链接时启用 "-ffast-math"会发生什么?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68938175/

相关文章:

c++ - Lua 5.2 在不同对象中使用 C API 沙箱

c++ - 使用模板或宏对多种类型进行静态分派(dispatch)

Linux 上的 C++ 开发 - 我从哪里开始?

linker - centos:链接没有用于 pkg-config 的 .pc 文件的库

c++ - 在 C++ 中打开和显示图像?

c++ - STL的意义何在?

c++ - 停止 dll 内操作的安全方法

linux - 如何编译生成.exe?

c++ - 构造函数 SFINAE 和继承在 clang 中失败

c - 在 C 中使用静态函数和变量的原因