[OpenMP][NVPTX] Added forward declaration for atomic operations

Pretty similar to D95058, this patch added forward declaration for
CUDA atomic functions. We already have definitions with right mangled names in
internal CUDA headers so the forward declaration here can work properly.

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D95085

GitOrigin-RevId: 48c54f0f623407192e93dc884724a12826eeab4f
diff --git a/libomptarget/deviceRTLs/nvptx/src/target_impl.h b/libomptarget/deviceRTLs/nvptx/src/target_impl.h
index ba3d331..1d7b649 100644
--- a/libomptarget/deviceRTLs/nvptx/src/target_impl.h
+++ b/libomptarget/deviceRTLs/nvptx/src/target_impl.h
@@ -130,6 +130,15 @@
 DEVICE unsigned GetWarpId();
 DEVICE unsigned GetLaneId();
 
+// Forward declaration of atomics. Although they're template functions, we
+// already have definitions for different types in CUDA internal headers with
+// the right mangled names.
+template <typename T> DEVICE T atomicAdd(T *address, T val);
+template <typename T> DEVICE T atomicInc(T *address, T val);
+template <typename T> DEVICE T atomicMax(T *address, T val);
+template <typename T> DEVICE T atomicExch(T *address, T val);
+template <typename T> DEVICE T atomicCAS(T *address, T compare, T val);
+
 // Atomics
 template <typename T> INLINE T __kmpc_atomic_add(T *address, T val) {
   return atomicAdd(address, val);