[OpenMP][DeviceRTL] Fixed an issue that causes hang in SU3

The synchronization at the end of parallel region cannot make sure all threads
exit the scope. As a result, the assertions right after it might be hit, and
further the `state::assumeInitialState(IsSPMD)` in `__kmpc_target_deinit` may
not hold as well. We either add a synchronization right after the parallel region,
or remove the assertions and assuptions. Here we choose the first one as those
assertions and assumptions can help optimizations.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112861

GitOrigin-RevId: 025f5492401489269ab980910f4fda98f5b06bd1
diff --git a/libomptarget/DeviceRTL/src/Parallelism.cpp b/libomptarget/DeviceRTL/src/Parallelism.cpp
index 8dcda21..ae7df3f 100644
--- a/libomptarget/DeviceRTL/src/Parallelism.cpp
+++ b/libomptarget/DeviceRTL/src/Parallelism.cpp
@@ -123,6 +123,11 @@
       synchronize::threadsAligned();
     }
 
+    // Synchronize all threads to make sure every thread exits the scope above;
+    // otherwise the following assertions and the assumption in
+    // __kmpc_target_deinit may not hold.
+    synchronize::threadsAligned();
+
     ASSERT(state::ParallelTeamSize == 1u);
     ASSERT(icv::ActiveLevel == 0u);
     ASSERT(icv::Level == 0u);