[OPENMP][NVPTX]Fix barriers and parallel level counters, NFC.

Summary:
Parallel level counter should be volatile to prevent some dangerous
optimiations by the ptxas. Otherwise, ptxas optimizations lead to
undefined behaviour in some cases.
Also, use __threadfence() for #pragma omp flush and if the barrier
should not be used (we have only one thread in the team), still perform
flush operation since the standard requires implicit flush when
executing barriers.

Reviewers: gtbercea, kkwli0, grokos

Subscribers: guansong, jfb, jdoerfert, openmp-commits, caomhin

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D62199

git-svn-id: https://llvm.org/svn/llvm-project/openmp/trunk@361421 91177308-0d34-0410-b5e6-96231b3b80d8
3 files changed