[AMDGPU] Correct rmw atomics s_waitcnt generation
The AMD GPU SIMemoryLegalizer was using the ordering address space
rather than the instruction address space when determining the
s_waitcnt to generate to ensure that a read-modify-write atomic has
completed. This resulted in additional unnecessary counters being
waited on.
Differential Revision: https://reviews.llvm.org/D96743
diff --git a/llvm/test/CodeGen/AMDGPU/atomicrmw-nand.ll b/llvm/test/CodeGen/AMDGPU/atomicrmw-nand.ll
index 76cd337..774270e 100644
--- a/llvm/test/CodeGen/AMDGPU/atomicrmw-nand.ll
+++ b/llvm/test/CodeGen/AMDGPU/atomicrmw-nand.ll
@@ -42,7 +42,7 @@
; GCN-NEXT: v_or_b32_e32 v2, -5, v2
; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GCN-NEXT: global_atomic_cmpswap v2, v[0:1], v[2:3], off glc
-; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
+; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: buffer_wbinvl1_vol
; GCN-NEXT: v_cmp_eq_u32_e32 vcc, v2, v3
; GCN-NEXT: s_or_b64 s[4:5], vcc, s[4:5]