| .. _omp112: |
| |
| Found thread data sharing on the GPU. Expect degraded performance due to data globalization. [OMP112] |
| ===================================================================================================== |
| |
| This missed remark indicates that a globalized value was found on the target |
| device that was not either replaced with stack memory by :ref:`OMP110 <omp110>` |
| or shared memory by :ref:`OMP111 <omp111>`. Globalization that has not been |
| removed will need to be handled by the runtime and will significantly impact |
| performance. |
| |
| The OpenMP standard requires that threads are able to share their data between |
| each-other. However, this is not true by default when offloading to a target |
| device such as a GPU. Threads on a GPU cannot shared their data unless it is |
| first placed in global or shared memory. In order to create standards complaint |
| code, the Clang compiler will globalize any variables that could potentially be |
| shared between the threads. In the majority of cases, globalized variables can |
| either be returns to a thread-local stack, or pushed to shared memory. However, |
| in a few cases it is necessary and will cause a performance penalty. |
| |
| Examples |
| -------- |
| |
| This example shows legitimate data sharing on the device. It is a convoluted |
| example, but is completely complaint with the OpenMP standard. If globalization |
| was not added this would result in different results on different target |
| devices. |
| |
| .. code-block:: c++ |
| |
| #include <omp.h> |
| #include <cstdio> |
| |
| #pragma omp declare target |
| static int *p; |
| #pragma omp end declare target |
| |
| void foo() { |
| int x = omp_get_thread_num(); |
| if (omp_get_thread_num() == 1) |
| p = &x; |
| |
| #pragma omp barrier |
| |
| printf ("Thread %d: %d\n", omp_get_thread_num(), *p); |
| } |
| |
| int main() { |
| #pragma omp target parallel |
| foo(); |
| } |
| |
| .. code-block:: console |
| |
| $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O1 -Rpass-missed=openmp-opt omp112.cpp |
| omp112.cpp:9:7: remark: Found thread data sharing on the GPU. Expect degraded performance |
| due to data globalization. [OMP112] [-Rpass-missed=openmp-opt] |
| int x = omp_get_thread_num(); |
| ^ |
| |
| A less convoluted example globalization that cannot be removed occurs when |
| calling functions that aren't visible from the current translation unit. |
| |
| .. code-block:: c++ |
| |
| extern void use(int *x); |
| |
| void foo() { |
| int x; |
| use(&x); |
| } |
| |
| int main() { |
| #pragma omp target parallel |
| foo(); |
| } |
| |
| .. code-block:: console |
| |
| $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O1 -Rpass-missed=openmp-opt omp112.cpp |
| omp112.cpp:4:7: remark: Found thread data sharing on the GPU. Expect degraded performance |
| due to data globalization. [OMP112] [-Rpass-missed=openmp-opt] |
| int x; |
| ^ |
| |
| Diagnostic Scope |
| ---------------- |
| |
| OpenMP target offloading missed remark. |