| ======= |
| ThinLTO |
| ======= |
| |
| .. contents:: |
| :local: |
| |
| Introduction |
| ============ |
| |
| *ThinLTO* compilation is a new type of LTO that is both scalable and |
| incremental. *LTO* (Link Time Optimization) achieves better |
| runtime performance through whole-program analysis and cross-module |
| optimization. However, monolithic LTO implements this by merging all |
| input into a single module, which is not scalable |
| in time or memory, and also prevents fast incremental compiles. |
| |
| In ThinLTO mode, as with regular LTO, clang emits LLVM bitcode after the |
| compile phase. The ThinLTO bitcode is augmented with a compact summary |
| of the module. During the link step, only the summaries are read and |
| merged into a combined summary index, which includes an index of function |
| locations for later cross-module function importing. Fast and efficient |
| whole-program analysis is then performed on the combined summary index. |
| |
| However, all transformations, including function importing, occur |
| later when the modules are optimized in fully parallel backends. |
| By default, linkers_ that support ThinLTO are set up to launch |
| the ThinLTO backends in threads. So the usage model is not affected |
| as the distinction between the fast serial thin link step and the backends |
| is transparent to the user. |
| |
| For more information on the ThinLTO design and current performance, |
| see the LLVM blog post `ThinLTO: Scalable and Incremental LTO |
| <http://blog.llvm.org/2016/06/thinlto-scalable-and-incremental-lto.html>`_. |
| While tuning is still in progress, results in the blog post show that |
| ThinLTO already performs well compared to LTO, in many cases matching |
| the performance improvement. |
| |
| Current Status |
| ============== |
| |
| Clang/LLVM |
| ---------- |
| .. _compiler: |
| |
| The 3.9 release of clang includes ThinLTO support. However, ThinLTO |
| is under active development, and new features, improvements and bugfixes |
| are being added for the next release. For the latest ThinLTO support, |
| `build a recent version of clang and LLVM |
| <https://llvm.org/docs/CMake.html>`_. |
| |
| Linkers |
| ------- |
| .. _linkers: |
| .. _linker: |
| |
| ThinLTO is currently supported for the following linkers: |
| |
| - **gold (via the gold-plugin)**: |
| Similar to monolithic LTO, this requires using |
| a `gold linker configured with plugins enabled |
| <https://llvm.org/docs/GoldPlugin.html>`_. |
| - **ld64**: |
| Starting with `Xcode 8 <https://developer.apple.com/xcode/>`_. |
| - **lld**: |
| Starting with r284050 for ELF, r298942 for COFF. |
| |
| Usage |
| ===== |
| |
| Basic |
| ----- |
| |
| To utilize ThinLTO, simply add the -flto=thin option to compile and link. E.g. |
| |
| .. code-block:: console |
| |
| % clang -flto=thin -O2 file1.c file2.c -c |
| % clang -flto=thin -O2 file1.o file2.o -o a.out |
| |
| When using lld-link, the -flto option need only be added to the compile step: |
| |
| .. code-block:: console |
| |
| % clang-cl -flto=thin -O2 -c file1.c file2.c |
| % lld-link /out:a.exe file1.obj file2.obj |
| |
| As mentioned earlier, by default the linkers will launch the ThinLTO backend |
| threads in parallel, passing the resulting native object files back to the |
| linker for the final native link. As such, the usage model is the same as |
| non-LTO. |
| |
| With gold, if you see an error during the link of the form: |
| |
| .. code-block:: console |
| |
| /usr/bin/ld: error: /path/to/clang/bin/../lib/LLVMgold.so: could not load plugin library: /path/to/clang/bin/../lib/LLVMgold.so: cannot open shared object file: No such file or directory |
| |
| Then either gold was not configured with plugins enabled, or clang |
| was not built with ``-DLLVM_BINUTILS_INCDIR`` set properly. See |
| the instructions for the |
| `LLVM gold plugin <https://llvm.org/docs/GoldPlugin.html#how-to-build-it>`_. |
| |
| Controlling Backend Parallelism |
| ------------------------------- |
| .. _parallelism: |
| |
| By default, the ThinLTO link step will launch as many |
| threads in parallel as there are cores. If the number of |
| cores can't be computed for the architecture, then it will launch |
| ``std::thread::hardware_concurrency`` number of threads in parallel. |
| For machines with hyper-threading, this is the total number of |
| virtual cores. For some applications and machine configurations this |
| may be too aggressive, in which case the amount of parallelism can |
| be reduced to ``N`` via: |
| |
| - gold: |
| ``-Wl,-plugin-opt,jobs=N`` |
| - ld64: |
| ``-Wl,-mllvm,-threads=N`` |
| - lld: |
| ``-Wl,--thinlto-jobs=N`` |
| - lld-link: |
| ``/opt:lldltojobs=N`` |
| |
| Other possible values for ``N`` are: |
| |
| - 0: |
| Use one thread per physical core (default) |
| - 1: |
| Use a single thread only (disable multi-threading) |
| - all: |
| Use one thread per logical core (uses all hyper-threads) |
| |
| Incremental |
| ----------- |
| .. _incremental: |
| |
| ThinLTO supports fast incremental builds through the use of a cache, |
| which currently must be enabled through a linker option. |
| |
| - gold (as of LLVM 4.0): |
| ``-Wl,-plugin-opt,cache-dir=/path/to/cache`` |
| - ld64 (support in clang 3.9 and Xcode 8): |
| ``-Wl,-cache_path_lto,/path/to/cache`` |
| - ELF lld (as of LLVM 5.0): |
| ``-Wl,--thinlto-cache-dir=/path/to/cache`` |
| - COFF lld-link (as of LLVM 6.0): |
| ``/lldltocache:/path/to/cache`` |
| |
| Cache Pruning |
| ------------- |
| |
| To help keep the size of the cache under control, ThinLTO supports cache |
| pruning. Cache pruning is supported with gold, ld64 and ELF and COFF lld, but |
| currently only gold, ELF and COFF lld allow you to control the policy with a |
| policy string. The cache policy must be specified with a linker option. |
| |
| - gold (as of LLVM 6.0): |
| ``-Wl,-plugin-opt,cache-policy=POLICY`` |
| - ELF lld (as of LLVM 5.0): |
| ``-Wl,--thinlto-cache-policy,POLICY`` |
| - COFF lld-link (as of LLVM 6.0): |
| ``/lldltocachepolicy:POLICY`` |
| |
| A policy string is a series of key-value pairs separated by ``:`` characters. |
| Possible key-value pairs are: |
| |
| - ``cache_size=X%``: The maximum size for the cache directory is ``X`` percent |
| of the available space on the disk. Set to 100 to indicate no limit, |
| 50 to indicate that the cache size will not be left over half the available |
| disk space. A value over 100 is invalid. A value of 0 disables the percentage |
| size-based pruning. The default is 75%. |
| |
| - ``cache_size_bytes=X``, ``cache_size_bytes=Xk``, ``cache_size_bytes=Xm``, |
| ``cache_size_bytes=Xg``: |
| Sets the maximum size for the cache directory to ``X`` bytes (or KB, MB, |
| GB respectively). A value over the amount of available space on the disk |
| will be reduced to the amount of available space. A value of 0 disables |
| the byte size-based pruning. The default is no byte size-based pruning. |
| |
| Note that ThinLTO will apply both size-based pruning policies simultaneously, |
| and changing one does not affect the other. For example, a policy of |
| ``cache_size_bytes=1g`` on its own will cause both the 1GB and default 75% |
| policies to be applied unless the default ``cache_size`` is overridden. |
| |
| - ``cache_size_files=X``: |
| Set the maximum number of files in the cache directory. Set to 0 to indicate |
| no limit. The default is 1000000 files. |
| |
| - ``prune_after=Xs``, ``prune_after=Xm``, ``prune_after=Xh``: Sets the |
| expiration time for cache files to ``X`` seconds (or minutes, hours |
| respectively). When a file hasn't been accessed for ``prune_after`` seconds, |
| it is removed from the cache. A value of 0 disables the expiration-based |
| pruning. The default is 1 week. |
| |
| - ``prune_interval=Xs``, ``prune_interval=Xm``, ``prune_interval=Xh``: |
| Sets the pruning interval to ``X`` seconds (or minutes, hours |
| respectively). This is intended to be used to avoid scanning the directory |
| too often. It does not impact the decision of which files to prune. A |
| value of 0 forces the scan to occur. The default is every 20 minutes. |
| |
| Clang Bootstrap |
| --------------- |
| |
| To `bootstrap clang/LLVM <https://llvm.org/docs/AdvancedBuilds.html#bootstrap-builds>`_ |
| with ThinLTO, follow these steps: |
| |
| 1. The host compiler_ must be a version of clang that supports ThinLTO. |
| #. The host linker_ must support ThinLTO (and in the case of gold, must be |
| `configured with plugins enabled <https://llvm.org/docs/GoldPlugin.html>`_). |
| #. Use the following additional `CMake variables |
| <https://llvm.org/docs/CMake.html#options-and-variables>`_ |
| when configuring the bootstrap compiler build: |
| |
| * ``-DLLVM_ENABLE_LTO=Thin`` |
| * ``-DCMAKE_C_COMPILER=/path/to/host/clang`` |
| * ``-DCMAKE_CXX_COMPILER=/path/to/host/clang++`` |
| * ``-DCMAKE_RANLIB=/path/to/host/llvm-ranlib`` |
| * ``-DCMAKE_AR=/path/to/host/llvm-ar`` |
| |
| Or, on Windows: |
| |
| * ``-DLLVM_ENABLE_LTO=Thin`` |
| * ``-DCMAKE_C_COMPILER=/path/to/host/clang-cl.exe`` |
| * ``-DCMAKE_CXX_COMPILER=/path/to/host/clang-cl.exe`` |
| * ``-DCMAKE_LINKER=/path/to/host/lld-link.exe`` |
| * ``-DCMAKE_RANLIB=/path/to/host/llvm-ranlib.exe`` |
| * ``-DCMAKE_AR=/path/to/host/llvm-ar.exe`` |
| |
| #. To use additional linker arguments for controlling the backend |
| parallelism_ or enabling incremental_ builds of the bootstrap compiler, |
| after configuring the build, modify the resulting CMakeCache.txt file in the |
| build directory. Specify any additional linker options after |
| ``CMAKE_EXE_LINKER_FLAGS:STRING=``. Note the configure may fail if |
| linker plugin options are instead specified directly in the previous step. |
| |
| The ``BOOTSTRAP_LLVM_ENABLE_LTO=Thin`` will enable ThinLTO for stage 2 and |
| stage 3 in case the compiler used for stage 1 does not support the ThinLTO |
| option. |
| |
| More Information |
| ================ |
| |
| * From LLVM project blog: |
| `ThinLTO: Scalable and Incremental LTO |
| <http://blog.llvm.org/2016/06/thinlto-scalable-and-incremental-lto.html>`_ |