[OPENMP][AMDGCN] Improvements to print_kernel_trace (bit mask)

allow bit masking to select various trace features.
  bit 0 => Launch tracing           (stderr)
  bit 1 => timing of runtime        (stdout)
  bit 2 => detailed launch tracing  (stderr)
  bit 3 => timing goes to stdout instead of stderr

  example: LIBOMPTARGET_KERNEL_TRACE=7     does it all
           LIBOMPTARGET_KERNEL_TRACE=5     Launch + details
           LIBOMPTARGET_KERNEL_TRACE=2     timings + launch to stderr
           LIBOMPTARGET_KERNEL_TRACE=10    timings + launch to stdout

Differential Revision: https://reviews.llvm.org/D96998

GitOrigin-RevId: 30c0d5b4c3f89efef8f79c47fd892d06432d87b1
2 files changed