| ================================ |
| Frequently Asked Questions (FAQ) |
| ================================ |
| |
| .. contents:: |
| :local: |
| |
| |
| License |
| ======= |
| |
| Can I modify LLVM source code and redistribute the modified source? |
| ------------------------------------------------------------------- |
| Yes. The modified source distribution must retain the copyright notice and |
| follow the conditions listed in the `Apache License v2.0 with LLVM Exceptions |
| <https://github.com/llvm/llvm-project/blob/main/llvm/LICENSE.TXT>`_. |
| |
| |
| Can I modify the LLVM source code and redistribute binaries or other tools based on it, without redistributing the source? |
| -------------------------------------------------------------------------------------------------------------------------- |
| Yes. This is why we distribute LLVM under a less restrictive license than GPL, |
| as explained in the first question above. |
| |
| |
| Can I use AI coding tools, such as GitHub co-pilot, to write LLVM patches? |
| -------------------------------------------------------------------------- |
| Yes, as long as the resulting work can be licensed under the project license, as |
| covered in the :doc:`DeveloperPolicy`. Using an AI tool to reproduce copyrighted |
| work does not rinse it of copyright and grant you the right to relicense it. |
| |
| |
| Source Code |
| =========== |
| |
| In what language is LLVM written? |
| --------------------------------- |
| All of the LLVM tools and libraries are written in C++ with extensive use of |
| the STL. |
| |
| |
| How portable is the LLVM source code? |
| ------------------------------------- |
| The LLVM source code should be portable to most modern Unix-like operating |
| systems. LLVM also has excellent support on Windows systems. |
| Most of the code is written in standard C++ with operating system |
| services abstracted to a support library. The tools required to build and |
| test LLVM have been ported to a plethora of platforms. |
| |
| |
| What API do I use to store a value to one of the virtual registers in LLVM IR's SSA representation? |
| --------------------------------------------------------------------------------------------------- |
| |
| In short: you can't. It's actually kind of a silly question once you grok |
| what's going on. Basically, in code like: |
| |
| .. code-block:: llvm |
| |
| %result = add i32 %foo, %bar |
| |
| , ``%result`` is just a name given to the ``Value`` of the ``add`` |
| instruction. In other words, ``%result`` *is* the add instruction. The |
| "assignment" doesn't explicitly "store" anything to any "virtual register"; |
| the "``=``" is more like the mathematical sense of equality. |
| |
| Longer explanation: In order to generate a textual representation of the |
| IR, some kind of name has to be given to each instruction so that other |
| instructions can textually reference it. However, the isomorphic in-memory |
| representation that you manipulate from C++ has no such restriction since |
| instructions can simply keep pointers to any other ``Value``'s that they |
| reference. In fact, the names of dummy numbered temporaries like ``%1`` are |
| not explicitly represented in the in-memory representation at all (see |
| ``Value::getName()``). |
| |
| |
| Source Languages |
| ================ |
| |
| What source languages are supported? |
| ------------------------------------ |
| |
| LLVM currently has full support for C and C++ source languages through |
| `Clang <https://clang.llvm.org/>`_. Many other language frontends have |
| been written using LLVM, and an incomplete list is available at |
| `projects with LLVM <https://llvm.org/ProjectsWithLLVM/>`_. |
| |
| |
| I'd like to write a self-hosting LLVM compiler. How should I interface with the LLVM middle-end optimizers and back-end code generators? |
| ---------------------------------------------------------------------------------------------------------------------------------------- |
| Your compiler front-end will communicate with LLVM by creating a module in the |
| LLVM intermediate representation (IR) format. Assuming you want to write your |
| language's compiler in the language itself (rather than C++), there are 3 |
| major ways to tackle generating LLVM IR from a front-end: |
| |
| 1. **Call into the LLVM libraries code using your language's FFI (foreign |
| function interface).** |
| |
| * *for:* best tracks changes to the LLVM IR, .ll syntax, and .bc format |
| |
| * *for:* enables running LLVM optimization passes without a emit/parse |
| overhead |
| |
| * *for:* adapts well to a JIT context |
| |
| * *against:* lots of ugly glue code to write |
| |
| 2. **Emit LLVM assembly from your compiler's native language.** |
| |
| * *for:* very straightforward to get started |
| |
| * *against:* the .ll parser is slower than the bitcode reader when |
| interfacing to the middle end |
| |
| * *against:* it may be harder to track changes to the IR |
| |
| 3. **Emit LLVM bitcode from your compiler's native language.** |
| |
| * *for:* can use the more-efficient bitcode reader when interfacing to the |
| middle end |
| |
| * *against:* you'll have to re-engineer the LLVM IR object model and bitcode |
| writer in your language |
| |
| * *against:* it may be harder to track changes to the IR |
| |
| If you go with the first option, the C bindings in include/llvm-c should help |
| a lot, since most languages have strong support for interfacing with C. The |
| most common hurdle with calling C from managed code is interfacing with the |
| garbage collector. The C interface was designed to require very little memory |
| management, and so is straightforward in this regard. |
| |
| What support is there for a higher level source language constructs for building a compiler? |
| -------------------------------------------------------------------------------------------- |
| Currently, there isn't much. LLVM supports an intermediate representation |
| which is useful for code representation but will not support the high level |
| (abstract syntax tree) representation needed by most compilers. There are no |
| facilities for lexical nor semantic analysis. |
| |
| |
| I don't understand the ``GetElementPtr`` instruction. Help! |
| ----------------------------------------------------------- |
| See `The Often Misunderstood GEP Instruction <GetElementPtr.html>`_. |
| |
| |
| Using the C and C++ Front Ends |
| ============================== |
| |
| Can I compile C or C++ code to platform-independent LLVM bitcode? |
| ----------------------------------------------------------------- |
| No. C and C++ are inherently platform-dependent languages. The most obvious |
| example of this is the preprocessor. A very common way that C code is made |
| portable is by using the preprocessor to include platform-specific code. In |
| practice, information about other platforms is lost after preprocessing, so |
| the result is inherently dependent on the platform that the preprocessing was |
| targeting. |
| |
| Another example is ``sizeof``. It's common for ``sizeof(long)`` to vary |
| between platforms. In most C front-ends, ``sizeof`` is expanded to a |
| constant immediately, thus hard-wiring a platform-specific detail. |
| |
| Also, since many platforms define their ABIs in terms of C, and since LLVM is |
| lower-level than C, front-ends currently must emit platform-specific IR in |
| order to have the result conform to the platform ABI. |
| |
| |
| Questions about code generated by the demo page |
| =============================================== |
| |
| What is this ``llvm.global_ctors`` and ``_GLOBAL__I_a...`` stuff that happens when I ``#include <iostream>``? |
| ------------------------------------------------------------------------------------------------------------- |
| If you ``#include`` the ``<iostream>`` header into a C++ translation unit, |
| the file will probably use the ``std::cin``/``std::cout``/... global objects. |
| However, C++ does not guarantee an order of initialization between static |
| objects in different translation units, so if a static ctor/dtor in your .cpp |
| file used ``std::cout``, for example, the object would not necessarily be |
| automatically initialized before your use. |
| |
| To make ``std::cout`` and friends work correctly in these scenarios, the STL |
| that we use declares a static object that gets created in every translation |
| unit that includes ``<iostream>``. This object has a static constructor |
| and destructor that initializes and destroys the global iostream objects |
| before they could possibly be used in the file. The code that you see in the |
| ``.ll`` file corresponds to the constructor and destructor registration code. |
| |
| If you would like to make it easier to *understand* the LLVM code generated |
| by the compiler in the demo page, consider using ``printf()`` instead of |
| ``iostream``\s to print values. |
| |
| |
| Where did all of my code go?? |
| ----------------------------- |
| If you are using the LLVM demo page, you may often wonder what happened to |
| all of the code that you typed in. Remember that the demo script is running |
| the code through the LLVM optimizers, so if your code doesn't actually do |
| anything useful, it might all be deleted. |
| |
| To prevent this, make sure that the code is actually needed. For example, if |
| you are computing some expression, return the value from the function instead |
| of leaving it in a local variable. If you really want to constrain the |
| optimizer, you can read from and assign to ``volatile`` global variables. |
| |
| |
| What is this "``undef``" thing that shows up in my code? |
| -------------------------------------------------------- |
| ``undef`` is the LLVM way of representing a value that is not defined. You |
| can get these if you do not initialize a variable before you use it. For |
| example, the C function: |
| |
| .. code-block:: c |
| |
| int X() { int i; return i; } |
| |
| Is compiled to "``ret i32 undef``" because "``i``" never has a value specified |
| for it. |
| |
| |
| Why does instcombine + simplifycfg turn a call to a function with a mismatched calling convention into "unreachable"? Why not make the verifier reject it? |
| ---------------------------------------------------------------------------------------------------------------------------------------------------------- |
| This is a common problem run into by authors of front-ends that are using |
| custom calling conventions: you need to make sure to set the right calling |
| convention on both the function and on each call to the function. For |
| example, this code: |
| |
| .. code-block:: llvm |
| |
| define fastcc void @foo() { |
| ret void |
| } |
| define void @bar() { |
| call void @foo() |
| ret void |
| } |
| |
| Is optimized to: |
| |
| .. code-block:: llvm |
| |
| define fastcc void @foo() { |
| ret void |
| } |
| define void @bar() { |
| unreachable |
| } |
| |
| ... with "``opt -instcombine -simplifycfg``". This often bites people because |
| "all their code disappears". Setting the calling convention on the caller and |
| callee is required for indirect calls to work, so people often ask why not |
| make the verifier reject this sort of thing. |
| |
| The answer is that this code has undefined behavior, but it is not illegal. |
| If we made it illegal, then every transformation that could potentially create |
| this would have to ensure that it doesn't, and there is valid code that can |
| create this sort of construct (in dead code). The sorts of things that can |
| cause this to happen are fairly contrived, but we still need to accept them. |
| Here's an example: |
| |
| .. code-block:: llvm |
| |
| define fastcc void @foo() { |
| ret void |
| } |
| define internal void @bar(void()* %FP, i1 %cond) { |
| br i1 %cond, label %T, label %F |
| T: |
| call void %FP() |
| ret void |
| F: |
| call fastcc void %FP() |
| ret void |
| } |
| define void @test() { |
| %X = or i1 false, false |
| call void @bar(void()* @foo, i1 %X) |
| ret void |
| } |
| |
| In this example, "test" always passes ``@foo``/``false`` into ``bar``, which |
| ensures that it is dynamically called with the right calling conv (thus, the |
| code is perfectly well defined). If you run this through the inliner, you |
| get this (the explicit "or" is there so that the inliner doesn't dead code |
| eliminate a bunch of stuff): |
| |
| .. code-block:: llvm |
| |
| define fastcc void @foo() { |
| ret void |
| } |
| define void @test() { |
| %X = or i1 false, false |
| br i1 %X, label %T.i, label %F.i |
| T.i: |
| call void @foo() |
| br label %bar.exit |
| F.i: |
| call fastcc void @foo() |
| br label %bar.exit |
| bar.exit: |
| ret void |
| } |
| |
| Here you can see that the inlining pass made an undefined call to ``@foo`` |
| with the wrong calling convention. We really don't want to make the inliner |
| have to know about this sort of thing, so it needs to be valid code. In this |
| case, dead code elimination can trivially remove the undefined code. However, |
| if ``%X`` was an input argument to ``@test``, the inliner would produce this: |
| |
| .. code-block:: llvm |
| |
| define fastcc void @foo() { |
| ret void |
| } |
| |
| define void @test(i1 %X) { |
| br i1 %X, label %T.i, label %F.i |
| T.i: |
| call void @foo() |
| br label %bar.exit |
| F.i: |
| call fastcc void @foo() |
| br label %bar.exit |
| bar.exit: |
| ret void |
| } |
| |
| The interesting thing about this is that ``%X`` *must* be false for the |
| code to be well-defined, but no amount of dead code elimination will be able |
| to delete the broken call as unreachable. However, since |
| ``instcombine``/``simplifycfg`` turns the undefined call into unreachable, we |
| end up with a branch on a condition that goes to unreachable: a branch to |
| unreachable can never happen, so "``-inline -instcombine -simplifycfg``" is |
| able to produce: |
| |
| .. code-block:: llvm |
| |
| define fastcc void @foo() { |
| ret void |
| } |
| define void @test(i1 %X) { |
| F.i: |
| call fastcc void @foo() |
| ret void |
| } |