[Libomptarget] Use NVPTX lane id intrinsic in DeviceRTL (#84928)

Summary:
We are currently taking the lower 5 bites of the thread ID as the warp
ID. This doesn't work in non-1D grids and is also slower than just using
the dedicated hardware register.
GitOrigin-RevId: 9f69d3cf88905df5006f93dce536b7e73c0b1735
1 file changed