[AMDGPU][SIInsertWaitcnts] Create a WCG instance per MF (#185916) WaitcntGenerator state depends on MF attributes, so create a new WCG object per MF until we have a better solution. This patch also adds a test that exercises this. Even though we stopped creating a new WCG instance in #177689, the behavior didn't change because SIInsertWaitcnts gets recreated on every MF (so this patch is practically an NFC). GitOrigin-RevId: 203c5c58f2fe2ad5089aec33628ce944e68ccf5d