[X86] Lower `minimum`/`maximum`/`minimumnum`/`maximumnum` using bitwise operations (#170069) I got somewhat nerd-sniped when looking at a Rust issue and seeing [this comment about how various min/max operations are compiled on various architectures](https://github.com/rust-lang/rust/issues/91079#issuecomment-3592393680). The current `minimum`/`maximum`/`minimumnum`/`maximumnum` code is very branchy because of the signed-zero handling. Even though we emit select operations, LLVM *really* prefers to lower them to branches, to the point of scalarizing vector code to do so, even if `blendv` is supported. (Should I open a separate issue for that? It seems concerning that LLVM would rather scalarize a vector operation than emit a couple `blendv` operations in a row.) It turns out that handling signed zero operands properly can be done using a couple bitwise operations, which is branchless and easily vectorizable, by taking advantage of the following properties: - When you take the maximum of two floats, the output sign bit will be the bitwise AND of the input sign bits (since 0 means positive, and the maximum always prefers the positive number). - When you take the minimum of two floats, the output sign bit will be the bitwise OR of the input sign bits (since 1 means negative, and the minimum always prefers the negative number). We can further optimize this by taking advantage of the fact that x86's min/max instructions operate like a floating-point compare+select, returning the second operand if both are (positive or negative) zero. Altogether, the operations go as follows: - For taking the minimum: - Call `minps`/`minpd`/etc. on the input operands. This will return the minimum, unless both are zero, in which case it will return the second operand. - Take the bitwise AND of the first operand and the highest bit, so that everything is zero except the sign bit. - Finally, OR that with the minimum from earlier. The only incorrect case was when the second operand was +0.0 and the first operand was -0.0. By OR-ing the first operand's sign bit with the existing minimum, we correct this. - Analogously, for taking the maximum: - Call `maxps`/`maxpd`/etc. on the input operands. This will return the maximum, unless both are zero, in which case it will return the second operand. - Take the bitwise OR of the first operand and a bit pattern which is all ones except for the highest bit, so that everything is ones except the sign bit. - Finally, AND that with the maximum from earlier. In the case of NaNs, this approach might change the output NaN's sign bit. We don't have to worry about this for a couple reasons: firstly, LLVM's language reference [allows NaNs to have a nondeterministic sign bit](https://llvm.org/docs/LangRef.html#floatnan); secondly, there's already a step after this that selects one of the input NaNs anyway. [Here's an Alive2 proof.](https://alive2.llvm.org/ce/z/EfQZ-G) It obviously can't verify that the implementation is sound, but shows that at least the theory is. I believe this approach is faster than even properly-vectorized `blendv` operations because it eliminates a data dependency chain. Furthermore on AVX-512, the load, AND, and OR can become a single `vpternlogd`. My highly-unrepresentative microbenchmarks (compiled for x86-64-v2, so SSE4.1) say ~7.5%-10% faster than `blendv`, which makes me confident this is at least not a regression.
Welcome to the LLVM project!
This repository contains the source code for LLVM, a toolkit for the construction of highly optimized compilers, optimizers, and run-time environments.
The LLVM project has multiple components. The core of the project is itself called “LLVM”. This contains all of the tools, libraries, and header files needed to process intermediate representations and convert them into object files. Tools include an assembler, disassembler, bitcode analyzer, and bitcode optimizer.
C-like languages use the Clang frontend. This component compiles C, C++, Objective-C, and Objective-C++ code into LLVM bitcode -- and from there into object files, using LLVM.
Other components include: the libc++ C++ standard library, the LLD linker, and more.
Consult the Getting Started with LLVM page for information on building and running LLVM.
For information on how to contribute to the LLVM project, please take a look at the Contributing to LLVM guide.
Join the LLVM Discourse forums, Discord chat, LLVM Office Hours or Regular sync-ups.
The LLVM project has adopted a code of conduct for participants to all modes of communication within the project.