commit: 65ad21d730d25789454d18e811f8ff5db79cb5d4
[log]
author: Luke Lau <luke@igalia.com>
Mon Sep 15 12:14:04 2025 +0800
committer: GitHub <noreply@github.com>
Mon Sep 15 04:14:04 2025 +0000
tree: 24e87afdce8784ba28574ce05edb5810e12a0795
parent: c44e015b49e832c9f3749c33cf5c9d5aacaf60a4 [diff]

[RISCV] Handle recurrences in RISCVVLOptimizer (#151285)

After #144666 we now support vectorizing loops with induction variables
with EVL tail folding. The induction updates don't use VP intrinsics to
avoid VL toggles but instead rely on RISCVVLOptimizer. However
RISCVVLOptimizer can't reason about cycles or recurrences today, which
means we are left with a VL toggle to VLMAX:

# %bb.1:                                # %for.body.preheader
	li	a2, 0
	vsetvli	a3, zero, e32, m2, ta, ma
	vid.v	v8
.LBB0_2:                                # %vector.body
                                        # =>This Inner Loop Header: Depth=1
	sub	a3, a1, a2
	sh2add	a4, a2, a0
	vsetvli	a3, a3, e32, m2, ta, ma
	vle32.v	v10, (a4)
	add	a2, a2, a3
	vadd.vv	v10, v10, v8
	vse32.v	v10, (a4)
	vsetvli	a4, zero, e32, m2, ta, ma
	vadd.vx	v8, v8, a3
	bne	a2, a1, .LBB0_2

This patch teaches RISCVVLOptimizer to reason about recurrences so we
can remove the VLMAX toggle:

# %bb.1:                                # %for.body.preheader
	li	a2, 0
	vsetvli	a3, zero, e32, m2, ta, ma
	vid.v	v8
.LBB0_2:                                # %vector.body
                                        # =>This Inner Loop Header: Depth=1
	sub	a3, a1, a2
	sh2add	a4, a2, a0
	vsetvli	a3, a3, e32, m2, ta, ma
	vle32.v	v10, (a4)
	add	a2, a2, a3
	vadd.vv	v10, v10, v8
	vse32.v	v10, (a4)
	vadd.vx	v8, v8, a3
	bne	a2, a1, .LBB0_2

With this we remove a significant number of VL toggles and vsetvli
instructions across llvm-test-suite and SPEC CPU 2017 with tail folding
enabled, since it affects every loop with an induction variable.

This builds upon the work in #124530 where we started computing what VL
each instruction demanded, and generalizes it to an optimistic sparse
dataflow analysis:

- We begin by optimistically assuming no VL is used by any instruction,
and push instructions onto the worklist starting from the bottom.
- For each instruction on the worklist we apply the transfer function,
which propagates the VL needed by that instruction upwards to the
instructions it uses. If a use's demanded VL changes, it's added to the
worklist.
- Eventually this converges to a fixpoint when all uses have been
processed and every demanded VL has been propagated throughout the
entire use-def chain. Only after this is the DemandedVL map accurate.

Some implementation details:

- The roots are stores (or other unsupported instructions not in
`isSupportedInstr`) or copies to physical registers (they fail the
`any_of(MI.defs(), isPhysical)` check)
- This patch untangles `getMinimumVLForUser` and `checkUsers`.
`getMinimumVLForUser` now returns how many lanes of an operand are read
by an instruction, whilst `checkUsers` checks that an instruction and
its users have compatible EEW/EMULs.
- The `DemandedVL` struct was added so that we have a default
constructor of 0 for `DenseMap<const MachineInstr *, DemandedVL>
DemandedVLs`, so we don't need to check if a key exists when looking
things up.

There was no measurable compile time impact on llvm-test-suite or SPEC
CPU 2017. The analysis will always terminate, there are more details in
this EuroLLVM talk here:
https://www.youtube.com/watch?v=Mfb5fRSdJAc

Fixes #149354

5 files changed

tree: 24e87afdce8784ba28574ce05edb5810e12a0795

README.md

The LLVM Compiler Infrastructure

Welcome to the LLVM project!

This repository contains the source code for LLVM, a toolkit for the construction of highly optimized compilers, optimizers, and run-time environments.

The LLVM project has multiple components. The core of the project is itself called “LLVM”. This contains all of the tools, libraries, and header files needed to process intermediate representations and convert them into object files. Tools include an assembler, disassembler, bitcode analyzer, and bitcode optimizer.

C-like languages use the Clang frontend. This component compiles C, C++, Objective-C, and Objective-C++ code into LLVM bitcode -- and from there into object files, using LLVM.

Other components include: the libc++ C++ standard library, the LLD linker, and more.

Getting the Source Code and Building LLVM

Consult the Getting Started with LLVM page for information on building and running LLVM.

For information on how to contribute to the LLVM project, please take a look at the Contributing to LLVM guide.

Getting in touch

Join the LLVM Discourse forums, Discord chat, LLVM Office Hours or Regular sync-ups.

The LLVM project has adopted a code of conduct for participants to all modes of communication within the project.