[OMPT] Fix thread_num for implicit_task_end callbacks in nested parallel regions

implicit_task_end callbacks in nested parallel regions did not always give the
correct thread_num, since the inner parallel region may have already been
finalized.
Now, the thread_num is stored at the beginning of the implicit task and
retrieved at the end, whenever necessary.

A testcase was added as well.

Differential Revision: https://reviews.llvm.org/D46260

git-svn-id: https://llvm.org/svn/llvm-project/openmp/trunk@331632 91177308-0d34-0410-b5e6-96231b3b80d8
4 files changed