[OpenMP] Move the recording code to account for KernelLaunchEnvironment

We need to record late to account for the kernel launch environment as
well as the potential changes in block and thread count.

GitOrigin-RevId: 726ee40f524918f9a6a6bba5a73e4d88c02a2cc3
1 file changed