[AMDGPU] Fix large return values with amdgpu_gfx

Returning in memory is not supported, so fall back to sret.
Also, extend i1 and i16 to i32. Otherwise, they would be passed through
memory.

Differential Revision: https://reviews.llvm.org/D100543

GitOrigin-RevId: 7842e1725e80863cb5462351afbc293cb3a19111
3 files changed