Blog Post

Understanding Fast-Math

$Illustration: Understanding Fast-Math$

While adding support for Xcode 13 to our iOS PDF SDK, we stumbled upon an interesting issue in our PDF renderer. Everything worked fine in debug builds, but when using the new developer tools to compile in release configuration, some PDF elements were missing. After a lot of printf debugging, it became apparent that a floating-point NaN check started failing, which eventually led us to the -ffast-math optimization we introduced a few years ago. Luckily, we managed to catch this issue during internal QA before it became a problem for our customers. As we clearly underestimated the risks associated with this optimization, it seemed prudent to take a closer look at it. It turns out that, like with almost anything else relating to IEEE floating-point math, it’s a rabbit hole full of surprising behaviors.

What Is Fast-Math?

-ffast-math is a compiler flag that enables a set of aggressive floating-point optimizations. The flag is shorthand for a collection of different optimization techniques, each having its own compiler flag. The exact set of optimizations differs between compilers, but it generally includes a set of optimizations that leverage algebraic rules that hold for real numbers, but not necessarily for IEEE floats.

Enabling -ffast-math will break strict IEEE compliance for your application and could result in changes in behavior. At best, it might affect the precision of computed numbers. At worst, it might significantly affect the program’s branching and produce completely unexpected results.

In Xcode, the fast-math optimization can be enabled with the GCC_FAST_MATH build setting. You can find it listed as Relax IEEE Compliance in the Xcode Build Settings UI under Apple Clang - Code Generation.

Clang

To better understand the optimizations -ffast-math enables, we can look at the specific set of behaviors (compiler flags) the option implies when using the Clang compiler:

-ffinite-math-only — Shorthand for -fno-honor-infinities and -fno-honor-nans.
- -fno-honor-infinities — The compiler assumes there are no infinities (+/-Inf) in the program (neither in the arguments nor in the results of floating-point arithmetic).
- -fno-honor-nans — The compiler assumes there are no NaNs in the program (neither in the arguments nor in the results of floating-point arithmetic).
-fno-math-errno — Enables optimizations that might cause standard C math functions to not set errno. This avoids a write to a thread-local variable and enables inlining of certain math functions.
funsafe-math-optimizations — Shorthand for a set of unsafe floating-point optimizations.
- -fassociative-math — Enables optimizations leveraging the associative property of real numbers, i.e. (x + y) + z => x + (y + z). Due to rounding errors, this algebraic law typically doesn’t apply to IEEE floating-point numbers.
- -freciprocal-math — Allows division operations to be transformed into multiplication by a reciprocal, e.g. x = a / c; y = b / c; => tmp = 1.0 / c; x = a _ tmp; y = b _ tmp. This can be significantly faster than division, but it can also be less precise.
- -fno-signed-zeros — Enables optimizations that ignore the sign of floating-point zeros. Without this option, IEEE arithmetic predicts specific behaviors for +0.0 and -0.0 values, which prohibit the simplification of expressions like x+0.0 or 0.0*x.
- -fno-trapping-math — The compiler assumes floating-point exceptions won’t ever actually invoke a signal handler, which enables speculative execution of floating-point expressions and simple optimizations like this one.
-ffp-contract=fast — Enables the use of floating-point contraction instructions, such as fused multiply-add (FMA). In turn, the floating-point contraction instructions combine two separate floating-point operations into a single operation. Those instructions can affect floating-point precision, because instead of rounding after each operation, the processor may round only once after both operations.

With Clang (and GCC), you can assume that -ffast-math will also be used when specifying the -Ofast optimization level.

Dealing with Finite Math

One of the more controversial optimizations from the list above is -ffinite-math-only, together with its two sub-options, -fno-honor-infinities and -fno-honor-nans. The official Clang documentation doesn’t go into too much detail and just defines -ffinite-math-only as allowing “floating-point optimizations that assume arguments and results are not NaNs or +-Inf.”

This option enables a set of optimizations for arithmetic expressions, which seem intuitive for real numbers but aren’t generally possible when we have to deal with NaN and Inf values in floating-point numbers. The option fits well with -fno-signed-zeros to enable an ever greater set of optimizations. So far, so good.

The controversy starts when we take a look at the behavior of a function like isnan. How should this check behave when we’re using -ffinite-math-only? Should it make a real check, or should the compiler just optimize it to false? With the current definition of this option, we’re essentially telling the compiler there will never be any NaNs in the program, so it’s technically free to optimize the check to a constant false.

While this optimization might make sense intuitively, provided you first carefully read the compiler documentation for -ffast-math and its suboptions, it also causes quite a few problems. For one, it breaks some reasonable workflows where we’d want to validate input data or where NaNs would be used as memory-efficient sentinels to mark invalid data in floating-point data structures. This is precisely the trap we fell into. Some of our C++ code uses NaNs to indicate invalid values for PDF primitives. Those values are checked with isnan, and branching is done accordingly. The code remained working fine for years after we first introduced the -ffast-math option. But it was always undefined behavior, and all it took was a compiler update to turn it into a regression.

The -ffinite-math-only optimization also causes inconsistencies where isnan checks will behave differently if they’re provided by the compiler or a library with different optimization settings (e.g. libc). There are also other standard APIs that might produce surprising behaviors — e.g. std::numeric_limits<T>::has_quiet_NaN() might still claim that NaNs are supported even when the optimization is applied. You could also go so far as to say that double and double under -ffinite-math-only should be considered different types due to the differences in behavior you’d see if your project uses -ffinite-math-only selectively.

Another way to look at the logic of having NaN checks optimized out is from a pure performance point of view. It should be fairly safe to assume that code that extensively uses isnan, and could therefore benefit the most by having NaN checks removed, is also the code that most likely cares about the correct output of those checks — and therefore can’t use -ffinite-math-only.

The -ffinite-math-only option could be made safer if we changed the definition of -ffinite-math-only to only apply to arithmetic expressions, but otherwise still allow NaN values. In other words, the assumption of no NaNs would be applied to mathematical expressions and functions, but not to tests like isnan. This alternative has been proposed a few times already — most recently in this fairly lengthy llvm-dev mailing list thread. In it, you can see that there are certainly good arguments to be made for either behavior, and at least for now, it appears as though the discussion ended in a stalemate.

Performance Impact

We could have refactored our code to not use NaNs in this way, or employed a number of workarounds to fix the issue with our isnan checks, like using integer operations to check the bits corresponding to NaN, or selectively disabling -ffinite-math-only in certain files. However, we didn’t do any of that. Instead, we opted to play it safe, and we globally disabled -ffast-math.

The option was introduced before we had reliable performance tests, so I was curious to see what impact this would have. To my surprise, there were no measurable differences outside of the standard deviation we already see when repeating tests multiple times. This isn’t to say that certain floating-point operations didn’t in fact become slower. They most likely did. However, in our case, they don’t seem to be causing any actual bottlenecks.

Conclusion

As you can see, -ffast-math is unfortunately not just a harmless optimization that makes your app run faster; it can also effect the correctness of your code. And even if it doesn’t do that right now, it might do that in the future with new compiler revisions.

Unless you see actual performance bottlenecks with your floating-point calculations, it’s best to avoid -ffast-math. And there’s a good chance it won’t have a significant impact on the performance characteristics of your average program. It didn’t make much of a difference for our renderer, even though it has to deal with a lot of floating-point operations.

If your performance tests do indicate that -ffast-math makes a difference, then be sure to spend some time auditing your floating-point calculations and control flow to avoid the more obvious pitfalls, such as the use of isnan and isinf. In the end, most other issues will still be very hard to notice, so you’ll have to accept a certain amount of risk. For us, the decision was easy — it’s just not worth the trouble.

Author

Matej Bukovinski CTO

Matej is a software engineering leader from Slovenia. He began his career freelancing and contributing to open source software. Later, he joined PSPDFKit, where he played a key role in creating its initial products and teams, eventually taking over as the company’s Chief Technology Officer. Outside of work, Matej enjoys playing tennis, skiing, and traveling.

Understanding Fast-Math

What Is Fast-Math?

Clang

Dealing with Finite Math

Performance Impact

Conclusion

Related Products

PSPDFKit for iOS

Share Post

Related Articles

PSPDFKit 13.8 for iOS Brings SwiftUI API to the Main Toolbar

Investigating a Dynamic Linking Crash with Xcode 16

Seamless Room Database Integration for Kotlin Multiplatform Projects