[AMDGPU][SIInsertWaitcnts] Create a WCG instance per MF (#185916)

WaitcntGenerator state depends on MF attributes, so create a new WCG object per MF
until we have a better solution. This patch also adds a test that exercises this.

Even though we stopped creating a new WCG instance in #177689, the behavior
didn't change because SIInsertWaitcnts gets recreated on every MF
(so this patch is practically an NFC).

GitOrigin-RevId: 203c5c58f2fe2ad5089aec33628ce944e68ccf5d
2 files changed