[OPENMP][CUDA]Use __syncthreads when compiled by nvcc and clang >= 9.0.

Summary:
The problems with __syncthreads() were fixed in clang >= 9.0 and the
original __syncthreads() can be used instead of the ptx instruction.

Reviewers: grokos

Subscribers: guansong, jdoerfert, openmp-commits, kkwli0, caomhin

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D63515

git-svn-id: https://llvm.org/svn/llvm-project/openmp/trunk@363807 91177308-0d34-0410-b5e6-96231b3b80d8
diff --git a/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h b/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h
index cd51538..646417d 100644
--- a/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h
+++ b/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h
@@ -61,7 +61,12 @@
 #endif
 
 #define __SYNCTHREADS_N(n) asm volatile("bar.sync %0;" : : "r"(n) : "memory");
+// Use original __syncthreads if compiled by nvcc or clang >= 9.0.
+#if !defined(__clang__) || __clang_major__ >= 9
+#define __SYNCTHREADS() __syncthreads()
+#else
 #define __SYNCTHREADS() __SYNCTHREADS_N(0)
+#endif
 
 // arguments needed for L0 parallelism only.
 class omptarget_nvptx_SharedArgs {