[OpenMP][DeviceRTL] Extract shuffle idiom and port it to declare variant

The shuffle idiom is differently implemented in our supported targets.
To reduce the "target_impl" file we now move the shuffle idiom in it's
own self-contained header that provides the implementation for AMDGPU
and NVPTX. A fallback can be added later on.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D95752

GitOrigin-RevId: 66ba494b4974017ba6e42deed138b9fb9ad50af7
10 files changed