[MachineCSE] Allow CSE for instructions with ignorable operands Ignorable operands don't impact instruction's behavior, we can safely do CSE on the instruction. It is split from D130919. It has big impact to some AMDGPU test cases. For example in atomic_optimizations_raw_buffer.ll, when trying to check if the following instruction can be CSEed %37:vgpr_32 = V_MOV_B32_e32 0, implicit $exec Function isCallerPreservedOrConstPhysReg is called on operand "implicit $exec", this function is implemented as - return TRI.isCallerPreservedPhysReg(Reg, MF) || + return TRI.isCallerPreservedPhysReg(Reg, MF) || TII.isIgnorableUse(MO) || (MRI.reservedRegsFrozen() && MRI.isConstantPhysReg(Reg)); Both TRI.isCallerPreservedPhysReg and MRI.isConstantPhysReg return false on this operand, so isCallerPreservedOrConstPhysReg is also false, it causes LLVM failed to CSE this instruction. With this patch TII.isIgnorableUse returns true for the operand $exec, so isCallerPreservedOrConstPhysReg also returns true, it causes this instruction to be CSEed with previous instruction %14:vgpr_32 = V_MOV_B32_e32 0, implicit $exec So I got different result from here. AMDGPU's implementation of isIgnorableUse is bool SIInstrInfo::isIgnorableUse(const MachineOperand &MO) const { // Any implicit use of exec by VALU is not a real register read. return MO.getReg() == AMDGPU::EXEC && MO.isImplicit() && isVALU(*MO.getParent()) && !resultDependsOnExec(*MO.getParent()); } Since the operand $exec is not a real register read, my understanding is it's reasonable to do CSE on such instructions. Because more instructions are CSEed, so I get less instructions generated for these tests. Differential Revision: https://reviews.llvm.org/D137222

commit: 11e86868c1a1ee67a1d88ef84b68193d06dc996d [log] [tgz]
author: Guozhi Wei <carrot@google.com> Mon Nov 14 19:34:59 2022 +0000
committer: Guozhi Wei <carrot@google.com> Mon Nov 14 19:34:59 2022 +0000
tree: d7f469322a82228e7efe29067a9a51b70f66055d
parent: 840a793375fec763c2b2781b82b764325635cc7a [diff] [blame]
diff --git a/llvm/test/CodeGen/AMDGPU/sgpr-control-flow.ll b/llvm/test/CodeGen/AMDGPU/sgpr-control-flow.ll
index fce7970..fbe91c1 100644
--- a/llvm/test/CodeGen/AMDGPU/sgpr-control-flow.ll
+++ b/llvm/test/CodeGen/AMDGPU/sgpr-control-flow.ll

@@ -162,13 +162,13 @@
 ; SI-NEXT:    s_load_dwordx2 s[0:1], s[0:1], 0xd
 ; SI-NEXT:    s_mov_b32 s2, 0
 ; SI-NEXT:    v_cmp_ne_u32_e32 vcc, 0, v0
+; SI-NEXT:    v_lshlrev_b32_e32 v0, 2, v0
 ; SI-NEXT:    ; implicit-def: $sgpr8_sgpr9
 ; SI-NEXT:    s_and_saveexec_b64 s[10:11], vcc
 ; SI-NEXT:    s_xor_b64 s[10:11], exec, s[10:11]
 ; SI-NEXT:    s_cbranch_execz .LBB3_2
 ; SI-NEXT:  ; %bb.1: ; %else
 ; SI-NEXT:    s_mov_b32 s3, 0xf000
-; SI-NEXT:    v_lshlrev_b32_e32 v0, 2, v0
 ; SI-NEXT:    v_mov_b32_e32 v1, 0
 ; SI-NEXT:    s_waitcnt lgkmcnt(0)
 ; SI-NEXT:    buffer_load_dword v0, v[0:1], s[0:3], 0 addr64
@@ -184,7 +184,6 @@
 ; SI-NEXT:    s_mov_b32 s15, 0xf000
 ; SI-NEXT:    s_mov_b32 s14, 0
 ; SI-NEXT:    s_mov_b64 s[12:13], s[6:7]
-; SI-NEXT:    v_lshlrev_b32_e32 v0, 2, v0
 ; SI-NEXT:    v_mov_b32_e32 v1, 0
 ; SI-NEXT:    buffer_load_dword v0, v[0:1], s[12:15], 0 addr64
 ; SI-NEXT:    s_andn2_b64 s[2:3], s[8:9], exec
commit	11e86868c1a1ee67a1d88ef84b68193d06dc996d	[log] [tgz]
author	Guozhi Wei <carrot@google.com>	Mon Nov 14 19:34:59 2022 +0000
committer	Guozhi Wei <carrot@google.com>	Mon Nov 14 19:34:59 2022 +0000
tree	d7f469322a82228e7efe29067a9a51b70f66055d
parent	840a793375fec763c2b2781b82b764325635cc7a [diff] [blame]