[OpenMP][libomptarget] Enable usage of shared memory slots

Summary:
Allow the runtime to use the existing shared memory statically allocated slots.

When a variable is globalized, the underlying memory can be either shared or global memory (both have block-wide visibility). In this case, we allow that the storage to use a limited amount of shared memory that has been statically allocated already. Only if shared memory doesn't prove to be enough do we then invoke malloc() to create a new global memory slot.

Reviewers: ABataev, carlo.bertolli, grokos, caomhin

Reviewed By: grokos

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D44486

git-svn-id: https://llvm.org/svn/llvm-project/openmp/trunk@327639 91177308-0d34-0410-b5e6-96231b3b80d8
diff --git a/libomptarget/deviceRTLs/nvptx/src/data_sharing.cu b/libomptarget/deviceRTLs/nvptx/src/data_sharing.cu
index 41976f6..e739ca9 100644
--- a/libomptarget/deviceRTLs/nvptx/src/data_sharing.cu
+++ b/libomptarget/deviceRTLs/nvptx/src/data_sharing.cu
@@ -342,16 +342,7 @@
 
   DataSharingState.SlotPtr[WID] = RootS;
   DataSharingState.TailPtr[WID] = RootS;
-
-  // Initialize the stack pointer to be equal to the end of
-  // the shared memory slot. This way we ensure that the global
-  // version of the stack will be used.
-  // TODO: remove this:
-  DataSharingState.StackPtr[WID] = RootS->DataEnd;
-
-  // TODO: When the use of shared memory is enabled we will have to
-  // initialize this with the start of the Data region like so:
-  // DataSharingState.StackPtr[WID] = (void *)&RootS->Data[0];
+  DataSharingState.StackPtr[WID] = (void *)&RootS->Data[0];
 
   // We initialize the list of references to arguments here.
   omptarget_nvptx_globalArgs.Init();
@@ -368,11 +359,6 @@
 // Called by: master, TODO: call by workers
 EXTERN void* __kmpc_data_sharing_push_stack(size_t DataSize,
     int16_t UseSharedMemory) {
-  // TODO: Add shared memory support. For now, use global memory only for
-  // storing the data sharing slots so ignore the pre-allocated
-  // shared memory slot.
-
-  // Use global memory for storing the stack.
   if (IsMasterThread()) {
     unsigned WID = getWarpId();