[RISCV] Avoid vl toggles when lowering vector_splice/experimental_vp_splice and add +vl-dependent-latency tuning feature (#146746)
When vectorizing a loop with a fixed-order recurrence we use a splice,
which gets lowered to a vslidedown and vslideup pair.
However with the way we lower it today we end up with extra vl toggles
in the loop, especially with EVL tail folding, e.g:
.LBB0_5: # %vector.body
# =>This Inner Loop Header: Depth=1
sub a5, a2, a3
sh2add a6, a3, a1
zext.w a7, a4
vsetvli a4, a5, e8, mf2, ta, ma
vle32.v v10, (a6)
addi a7, a7, -1
vsetivli zero, 1, e32, m2, ta, ma
vslidedown.vx v8, v8, a7
sh2add a6, a3, a0
vsetvli zero, a5, e32, m2, ta, ma
vslideup.vi v8, v10, 1
vadd.vv v8, v10, v8
add a3, a3, a4
vse32.v v8, (a6)
vmv2r.v v8, v10
bne a3, a2, .LBB0_5
Because the vslideup overwrites all but UpOffset elements from the
vslidedown, we currently set the vslidedown's AVL to said offset.
But in the vslideup we use either VLMAX or the EVL which causes a
toggle.
This increases the AVL of the vslidedown so it matches vslideup, even if
the extra elements are overridden, to avoid the toggle.
A new tuning feature +vl-dependent-latency has been added which keeps
the old behaviour for microarchitectures that dynamically dispatch uops
based on vl, e.g. sifive-x280.
+vl-dependent-latency can be reused for the recently proposed Ovlt
optimization directive if/when it's ratified:
https://lists.riscv.org/g/tech-privileged/message/2487
If we wanted to aggressively optimise for vl at the expense of
introducing more toggles we could probably look at doing this in
RISCVVLOptimizer.Welcome to the LLVM project!
This repository contains the source code for LLVM, a toolkit for the construction of highly optimized compilers, optimizers, and run-time environments.
The LLVM project has multiple components. The core of the project is itself called “LLVM”. This contains all of the tools, libraries, and header files needed to process intermediate representations and convert them into object files. Tools include an assembler, disassembler, bitcode analyzer, and bitcode optimizer.
C-like languages use the Clang frontend. This component compiles C, C++, Objective-C, and Objective-C++ code into LLVM bitcode -- and from there into object files, using LLVM.
Other components include: the libc++ C++ standard library, the LLD linker, and more.
Consult the Getting Started with LLVM page for information on building and running LLVM.
For information on how to contribute to the LLVM project, please take a look at the Contributing to LLVM guide.
Join the LLVM Discourse forums, Discord chat, LLVM Office Hours or Regular sync-ups.
The LLVM project has adopted a code of conduct for participants to all modes of communication within the project.