| commit | 23845e3ebed4eb2bec486372123dcd2fc560f01b | [log] [tgz] |
|---|---|---|
| author | Wenju He <wenju.he@intel.com> | Fri Dec 19 14:36:03 2025 +0800 |
| committer | Copybara-Service <copybara-worker@google.com> | Thu Dec 18 22:40:48 2025 -0800 |
| tree | b527956ca222eb46d9dd2a3d456e9ce625d2708f | |
| parent | b653edcb027eb392c3e6bf8e125d4f4c65d51769 [diff] |
[libclc] Improve __clc_min/max/clamp implementation (#172599)
Replace __clc_max/min with __clc_fmax/fmin in __clc_clamp. FP
__clc_min/max/clamp now lowers to @llvm.minimumnum/@llvm.maximumnum, and
integer clamp lowers to @llvm.umin/@llvm.umax. This reduce fcmp+select
chains and improving codegen. Example change to amdgcn--amdhsa.bc:
```
in function _Z5clamphhh:
> %4 = icmp ugt i8 %0, %2
%4 = tail call noundef i8 @llvm.umax.i8(i8 %0, i8 %1)
> %6 = select i1 %4, i8 %2, i8 %5
> ret i8 %6
< %5 = tail call noundef i8 @llvm.umin.i8(i8 %2, i8 %4)
< ret i8 %5
in function _Z5clampddd:
in block %3 / %3:
> %4 = fcmp ogt double %0, %2
> %5 = fcmp olt double %0, %1
> %6 = select i1 %5, double %1, double %0
> %7 = select i1 %4, double %2, double %6
> ret double %7
< %4 = tail call noundef double @llvm.maximumnum.f64(double %0, double %1)
< %5 = tail call noundef double @llvm.minimumnum.f64(double %4, double %2)
< ret double %5
```
GitOrigin-RevId: d5326411fe866e010aadd3af3155b656a5aeaae3
libclc is an open source implementation of the library requirements of the OpenCL C programming language, as specified by the OpenCL 1.1 Specification. The following sections of the specification impose library requirements:
libclc is intended to be used with the Clang compiler's OpenCL frontend.
libclc is designed to be portable and extensible. To this end, it provides generic implementations of most library requirements, allowing the target to override the generic implementation at the granularity of individual functions.
libclc currently supports PTX, AMDGPU, SPIRV and CLSPV targets, but support for more targets is welcome.
(in the following instructions you can use make or ninja)
For an in-tree build, Clang must also be built at the same time:
$ cmake <path-to>/llvm-project/llvm/CMakeLists.txt -DLLVM_ENABLE_PROJECTS="clang" \
-DLLVM_ENABLE_RUNTIMES="libclc" -DCMAKE_BUILD_TYPE=Release -G Ninja
$ ninja
Then install:
$ ninja install
Note you can use the DESTDIR Makefile variable to do staged installs.
$ DESTDIR=/path/for/staged/install ninja install
To build out of tree, or in other words, against an existing LLVM build or install:
$ cmake <path-to>/llvm-project/libclc/CMakeLists.txt -DCMAKE_BUILD_TYPE=Release \ -G Ninja -DLLVM_DIR=$(<path-to>/llvm-config --cmakedir) $ ninja
Then install as before.
In both cases this will include all supported targets. You can choose which targets are enabled by passing -DLIBCLC_TARGETS_TO_BUILD to CMake. The default is all.
In both cases, the LLVM used must include the targets you want libclc support for (AMDGPU and NVPTX are enabled in LLVM by default). Apart from SPIRV where you do not need an LLVM target but you do need the llvm-spirv tool available. Either build this in-tree, or place it in the directory pointed to by LLVM_TOOLS_BINARY_DIR.