[x86] split FMA with fast-math-flags to avoid libcall

fma reassoc A, B, C --> fadd (fmul A, B), C (when target has no FMA hardware)

C/C++ code may use explicit fma() calls (which become LLVM fma
intrinsics in IR) but then gets compiled with -ffast-math or similar.
For targets that do not have FMA hardware, we don't want to go out to
the math library for a precise but slow FMA result.

I tried this as a generic DAGCombine, but it caused infinite looping
on more than 1 other target, so there's likely some over-reaching fma
formation happening.

There's also a potential intersection of strict FP with fast-math here.
Deferring to current behavior for that case (assuming that strict-ness
overrides fast-ness).

Differential Revision: https://reviews.llvm.org/D83981

GitOrigin-RevId: 50afa18772daca0b6de253a7c5311c81b0a46682
2 files changed