[OPENMP][NVPTX]Correctly handle L2 parallelism in SPMD mode.

The parallelLevel counter must be on per-thread basis to fully support
L2+ parallelism, otherwise we may end up with undefined behavior.
Introduce the parallelLevel on per-warp basis using shared memory. It
allows to avoid the problems with the synchronization and allows fully
support L2+ parallelism in SPMD mode with no runtime.

Reviewers: gtbercea, grokos

Subscribers: guansong, jdoerfert, caomhin, kkwli0, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D60918

git-svn-id: https://llvm.org/svn/llvm-project/openmp/trunk@359341 91177308-0d34-0410-b5e6-96231b3b80d8
8 files changed