e5f03d7facc9040ee5897eb853ea2e274e75e1f4 - llvm-project/openmp

commit	e5f03d7facc9040ee5897eb853ea2e274e75e1f4	[log] [tgz]
author	Joseph Huber <huberjn@outlook.com>	Thu Jan 11 11:32:43 2024 -0600
committer	Copybara-Service <copybara-worker@google.com>	Thu Jan 11 09:35:47 2024 -0800
tree	dfeb531b4055480e1b27f7c982d7879690898e45
parent	369f9e94da8d47940230b48fc053fea2c046fcc5 [diff]

[Libomptarget] Fix JIT on the NVPTX target by calling ptx manually (#77801)

Summary:
Recently a patch added an assertion in the GlobalHandler to indicate
when an ELF was not used. This began to fire whenever NVPTX JIT was
used, because the JIT pass output a PTX file instead of an ELF. The
CUModuleLoad method consumes `.s` internally and compiles it to a cubin,
however, this is too late as we perform several checks on the ELF
directly for the presence of certain symbols and to read some necessary
constants. This results in inconsistent behaviour.

To address this, this patch simply calls `ptxas` manually, similar to
how `lld` is called for the AMDGPU JIT pass. This is inevitably going to
be slower than simply passing it to the CUDA routine due to the overhead
involved in file IO and a fork call, but it's necessary for correctness.

CUDA provides an API for compiling PTX manually. However, this only
started showing up in CUDA 11.1 and is only provided "officially" in a
static library. The `libnvidia-ptxjitcompiler.so` next to the CUDA
driver has the same symbols and can likely be used as a replacement.
This would be the faster solution. However, given that it's not
documented it may have some issues.

GitOrigin-RevId: 3ede817f5bd947cb0da63187f333a6274bf1f418

libomptarget/plugins-nextgen/cuda/src/rtl.cpp[diff]

1 file changed

tree: dfeb531b4055480e1b27f7c982d7879690898e45