| ======================================================= |
| Building a JIT: Starting out with KaleidoscopeJIT |
| ======================================================= |
| |
| .. contents:: |
| :local: |
| |
| Chapter 1 Introduction |
| ====================== |
| |
| **Warning: This tutorial is currently being updated to account for ORC API |
| changes. Only Chapters 1 and 2 are up-to-date.** |
| |
| **Example code from Chapters 3 to 5 will compile and run, but has not been |
| updated** |
| |
| Welcome to Chapter 1 of the "Building an ORC-based JIT in LLVM" tutorial. This |
| tutorial runs through the implementation of a JIT compiler using LLVM's |
| On-Request-Compilation (ORC) APIs. It begins with a simplified version of the |
| KaleidoscopeJIT class used in the |
| `Implementing a language with LLVM <LangImpl01.html>`_ tutorials and then |
| introduces new features like concurrent compilation, optimization, lazy |
| compilation and remote execution. |
| |
| The goal of this tutorial is to introduce you to LLVM's ORC JIT APIs, show how |
| these APIs interact with other parts of LLVM, and to teach you how to recombine |
| them to build a custom JIT that is suited to your use-case. |
| |
| The structure of the tutorial is: |
| |
| - Chapter #1: Investigate the simple KaleidoscopeJIT class. This will |
| introduce some of the basic concepts of the ORC JIT APIs, including the |
| idea of an ORC *Layer*. |
| |
| - `Chapter #2 <BuildingAJIT2.html>`_: Extend the basic KaleidoscopeJIT by adding |
| a new layer that will optimize IR and generated code. |
| |
| - `Chapter #3 <BuildingAJIT3.html>`_: Further extend the JIT by adding a |
| Compile-On-Demand layer to lazily compile IR. |
| |
| - `Chapter #4 <BuildingAJIT4.html>`_: Improve the laziness of our JIT by |
| replacing the Compile-On-Demand layer with a custom layer that uses the ORC |
| Compile Callbacks API directly to defer IR-generation until functions are |
| called. |
| |
| - `Chapter #5 <BuildingAJIT5.html>`_: Add process isolation by JITing code into |
| a remote process with reduced privileges using the JIT Remote APIs. |
| |
| To provide input for our JIT we will use a lightly modified version of the |
| Kaleidoscope REPL from `Chapter 7 <LangImpl07.html>`_ of the "Implementing a |
| language in LLVM tutorial". |
| |
| Finally, a word on API generations: ORC is the 3rd generation of LLVM JIT API. |
| It was preceded by MCJIT, and before that by the (now deleted) legacy JIT. |
| These tutorials don't assume any experience with these earlier APIs, but |
| readers acquainted with them will see many familiar elements. Where appropriate |
| we will make this connection with the earlier APIs explicit to help people who |
| are transitioning from them to ORC. |
| |
| JIT API Basics |
| ============== |
| |
| The purpose of a JIT compiler is to compile code "on-the-fly" as it is needed, |
| rather than compiling whole programs to disk ahead of time as a traditional |
| compiler does. To support that aim our initial, bare-bones JIT API will have |
| just two functions: |
| |
| 1. ``Error addModule(std::unique_ptr<Module> M)``: Make the given IR module |
| available for execution. |
| 2. ``Expected<JITEvaluatedSymbol> lookup()``: Search for pointers to |
| symbols (functions or variables) that have been added to the JIT. |
| |
| A basic use-case for this API, executing the 'main' function from a module, |
| will look like: |
| |
| .. code-block:: c++ |
| |
| JIT J; |
| J.addModule(buildModule()); |
| auto *Main = (int(*)(int, char*[]))J.lookup("main").getAddress(); |
| int Result = Main(); |
| |
| The APIs that we build in these tutorials will all be variations on this simple |
| theme. Behind this API we will refine the implementation of the JIT to add |
| support for concurrent compilation, optimization and lazy compilation. |
| Eventually we will extend the API itself to allow higher-level program |
| representations (e.g. ASTs) to be added to the JIT. |
| |
| KaleidoscopeJIT |
| =============== |
| |
| In the previous section we described our API, now we examine a simple |
| implementation of it: The KaleidoscopeJIT class [1]_ that was used in the |
| `Implementing a language with LLVM <LangImpl01.html>`_ tutorials. We will use |
| the REPL code from `Chapter 7 <LangImpl07.html>`_ of that tutorial to supply the |
| input for our JIT: Each time the user enters an expression the REPL will add a |
| new IR module containing the code for that expression to the JIT. If the |
| expression is a top-level expression like '1+1' or 'sin(x)', the REPL will also |
| use the lookup method of our JIT class find and execute the code for the |
| expression. In later chapters of this tutorial we will modify the REPL to enable |
| new interactions with our JIT class, but for now we will take this setup for |
| granted and focus our attention on the implementation of our JIT itself. |
| |
| Our KaleidoscopeJIT class is defined in the KaleidoscopeJIT.h header. After the |
| usual include guards and #includes [2]_, we get to the definition of our class: |
| |
| .. code-block:: c++ |
| |
| #ifndef LLVM_EXECUTIONENGINE_ORC_KALEIDOSCOPEJIT_H |
| #define LLVM_EXECUTIONENGINE_ORC_KALEIDOSCOPEJIT_H |
| |
| #include "llvm/ADT/StringRef.h" |
| #include "llvm/ExecutionEngine/JITSymbol.h" |
| #include "llvm/ExecutionEngine/Orc/CompileUtils.h" |
| #include "llvm/ExecutionEngine/Orc/Core.h" |
| #include "llvm/ExecutionEngine/Orc/ExecutionUtils.h" |
| #include "llvm/ExecutionEngine/Orc/IRCompileLayer.h" |
| #include "llvm/ExecutionEngine/Orc/JITTargetMachineBuilder.h" |
| #include "llvm/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.h" |
| #include "llvm/ExecutionEngine/SectionMemoryManager.h" |
| #include "llvm/IR/DataLayout.h" |
| #include "llvm/IR/LLVMContext.h" |
| #include <memory> |
| |
| namespace llvm { |
| namespace orc { |
| |
| class KaleidoscopeJIT { |
| private: |
| ExecutionSession ES; |
| RTDyldObjectLinkingLayer ObjectLayer; |
| IRCompileLayer CompileLayer; |
| |
| DataLayout DL; |
| MangleAndInterner Mangle; |
| ThreadSafeContext Ctx; |
| |
| public: |
| KaleidoscopeJIT(JITTargetMachineBuilder JTMB, DataLayout DL) |
| : ObjectLayer(ES, |
| []() { return std::make_unique<SectionMemoryManager>(); }), |
| CompileLayer(ES, ObjectLayer, ConcurrentIRCompiler(std::move(JTMB))), |
| DL(std::move(DL)), Mangle(ES, this->DL), |
| Ctx(std::make_unique<LLVMContext>()) { |
| ES.getMainJITDylib().addGenerator( |
| cantFail(DynamicLibrarySearchGenerator::GetForCurrentProcess(DL.getGlobalPrefix()))); |
| } |
| |
| Our class begins with six member variables: An ExecutionSession member, ``ES``, |
| which provides context for our running JIT'd code (including the string pool, |
| global mutex, and error reporting facilities); An RTDyldObjectLinkingLayer, |
| ``ObjectLayer``, that can be used to add object files to our JIT (though we will |
| not use it directly); An IRCompileLayer, ``CompileLayer``, that can be used to |
| add LLVM Modules to our JIT (and which builds on the ObjectLayer), A DataLayout |
| and MangleAndInterner, ``DL`` and ``Mangle``, that will be used for symbol mangling |
| (more on that later); and finally an LLVMContext that clients will use when |
| building IR files for the JIT. |
| |
| Next up we have our class constructor, which takes a `JITTargetMachineBuilder`` |
| that will be used by our IRCompiler, and a ``DataLayout`` that we will use to |
| initialize our DL member. The constructor begins by initializing our |
| ObjectLayer. The ObjectLayer requires a reference to the ExecutionSession, and |
| a function object that will build a JIT memory manager for each module that is |
| added (a JIT memory manager manages memory allocations, memory permissions, and |
| registration of exception handlers for JIT'd code). For this we use a lambda |
| that returns a SectionMemoryManager, an off-the-shelf utility that provides all |
| the basic memory management functionality required for this chapter. Next we |
| initialize our CompileLayer. The CompileLayer needs three things: (1) A |
| reference to the ExecutionSession, (2) A reference to our object layer, and (3) |
| a compiler instance to use to perform the actual compilation from IR to object |
| files. We use the off-the-shelf ConcurrentIRCompiler utility as our compiler, |
| which we construct using this constructor's JITTargetMachineBuilder argument. |
| The ConcurrentIRCompiler utility will use the JITTargetMachineBuilder to build |
| llvm TargetMachines (which are not thread safe) as needed for compiles. After |
| this, we initialize our supporting members: ``DL``, ``Mangler`` and ``Ctx`` with |
| the input DataLayout, the ExecutionSession and DL member, and a new default |
| constructed LLVMContext respectively. Now that our members have been initialized, |
| so the one thing that remains to do is to tweak the configuration of the |
| *JITDylib* that we will store our code in. We want to modify this dylib to |
| contain not only the symbols that we add to it, but also the symbols from our |
| REPL process as well. We do this by attaching a |
| ``DynamicLibrarySearchGenerator`` instance using the |
| ``DynamicLibrarySearchGenerator::GetForCurrentProcess`` method. |
| |
| |
| .. code-block:: c++ |
| |
| static Expected<std::unique_ptr<KaleidoscopeJIT>> Create() { |
| auto JTMB = JITTargetMachineBuilder::detectHost(); |
| |
| if (!JTMB) |
| return JTMB.takeError(); |
| |
| auto DL = JTMB->getDefaultDataLayoutForTarget(); |
| if (!DL) |
| return DL.takeError(); |
| |
| return std::make_unique<KaleidoscopeJIT>(std::move(*JTMB), std::move(*DL)); |
| } |
| |
| const DataLayout &getDataLayout() const { return DL; } |
| |
| LLVMContext &getContext() { return *Ctx.getContext(); } |
| |
| Next we have a named constructor, ``Create``, which will build a KaleidoscopeJIT |
| instance that is configured to generate code for our host process. It does this |
| by first generating a JITTargetMachineBuilder instance using that classes' |
| detectHost method and then using that instance to generate a datalayout for |
| the target process. Each of these operations can fail, so each returns its |
| result wrapped in an Expected value [3]_ that we must check for error before |
| continuing. If both operations succeed we can unwrap their results (using the |
| dereference operator) and pass them into KaleidoscopeJIT's constructor on the |
| last line of the function. |
| |
| Following the named constructor we have the ``getDataLayout()`` and |
| ``getContext()`` methods. These are used to make data structures created and |
| managed by the JIT (especially the LLVMContext) available to the REPL code that |
| will build our IR modules. |
| |
| .. code-block:: c++ |
| |
| void addModule(std::unique_ptr<Module> M) { |
| cantFail(CompileLayer.add(ES.getMainJITDylib(), |
| ThreadSafeModule(std::move(M), Ctx))); |
| } |
| |
| Expected<JITEvaluatedSymbol> lookup(StringRef Name) { |
| return ES.lookup({&ES.getMainJITDylib()}, Mangle(Name.str())); |
| } |
| |
| Now we come to the first of our JIT API methods: addModule. This method is |
| responsible for adding IR to the JIT and making it available for execution. In |
| this initial implementation of our JIT we will make our modules "available for |
| execution" by adding them to the CompileLayer, which will it turn store the |
| Module in the main JITDylib. This process will create new symbol table entries |
| in the JITDylib for each definition in the module, and will defer compilation of |
| the module until any of its definitions is looked up. Note that this is not lazy |
| compilation: just referencing a definition, even if it is never used, will be |
| enough to trigger compilation. In later chapters we will teach our JIT to defer |
| compilation of functions until they're actually called. To add our Module we |
| must first wrap it in a ThreadSafeModule instance, which manages the lifetime of |
| the Module's LLVMContext (our Ctx member) in a thread-friendly way. In our |
| example, all modules will share the Ctx member, which will exist for the |
| duration of the JIT. Once we switch to concurrent compilation in later chapters |
| we will use a new context per module. |
| |
| Our last method is ``lookup``, which allows us to look up addresses for |
| function and variable definitions added to the JIT based on their symbol names. |
| As noted above, lookup will implicitly trigger compilation for any symbol |
| that has not already been compiled. Our lookup method calls through to |
| `ExecutionSession::lookup`, passing in a list of dylibs to search (in our case |
| just the main dylib), and the symbol name to search for, with a twist: We have |
| to *mangle* the name of the symbol we're searching for first. The ORC JIT |
| components use mangled symbols internally the same way a static compiler and |
| linker would, rather than using plain IR symbol names. This allows JIT'd code |
| to interoperate easily with precompiled code in the application or shared |
| libraries. The kind of mangling will depend on the DataLayout, which in turn |
| depends on the target platform. To allow us to remain portable and search based |
| on the un-mangled name, we just re-produce this mangling ourselves using our |
| ``Mangle`` member function object. |
| |
| This brings us to the end of Chapter 1 of Building a JIT. You now have a basic |
| but fully functioning JIT stack that you can use to take LLVM IR and make it |
| executable within the context of your JIT process. In the next chapter we'll |
| look at how to extend this JIT to produce better quality code, and in the |
| process take a deeper look at the ORC layer concept. |
| |
| `Next: Extending the KaleidoscopeJIT <BuildingAJIT2.html>`_ |
| |
| Full Code Listing |
| ================= |
| |
| Here is the complete code listing for our running example. To build this |
| example, use: |
| |
| .. code-block:: bash |
| |
| # Compile |
| clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orcjit native` -O3 -o toy |
| # Run |
| ./toy |
| |
| Here is the code: |
| |
| .. literalinclude:: ../../examples/Kaleidoscope/BuildingAJIT/Chapter1/KaleidoscopeJIT.h |
| :language: c++ |
| |
| .. [1] Actually we use a cut-down version of KaleidoscopeJIT that makes a |
| simplifying assumption: symbols cannot be re-defined. This will make it |
| impossible to re-define symbols in the REPL, but will make our symbol |
| lookup logic simpler. Re-introducing support for symbol redefinition is |
| left as an exercise for the reader. (The KaleidoscopeJIT.h used in the |
| original tutorials will be a helpful reference). |
| |
| .. [2] +-----------------------------+-----------------------------------------------+ |
| | File | Reason for inclusion | |
| +=============================+===============================================+ |
| | JITSymbol.h | Defines the lookup result type | |
| | | JITEvaluatedSymbol | |
| +-----------------------------+-----------------------------------------------+ |
| | CompileUtils.h | Provides the SimpleCompiler class. | |
| +-----------------------------+-----------------------------------------------+ |
| | Core.h | Core utilities such as ExecutionSession and | |
| | | JITDylib. | |
| +-----------------------------+-----------------------------------------------+ |
| | ExecutionUtils.h | Provides the DynamicLibrarySearchGenerator | |
| | | class. | |
| +-----------------------------+-----------------------------------------------+ |
| | IRCompileLayer.h | Provides the IRCompileLayer class. | |
| +-----------------------------+-----------------------------------------------+ |
| | JITTargetMachineBuilder.h | Provides the JITTargetMachineBuilder class. | |
| +-----------------------------+-----------------------------------------------+ |
| | RTDyldObjectLinkingLayer.h | Provides the RTDyldObjectLinkingLayer class. | |
| +-----------------------------+-----------------------------------------------+ |
| | SectionMemoryManager.h | Provides the SectionMemoryManager class. | |
| +-----------------------------+-----------------------------------------------+ |
| | DataLayout.h | Provides the DataLayout class. | |
| +-----------------------------+-----------------------------------------------+ |
| | LLVMContext.h | Provides the LLVMContext class. | |
| +-----------------------------+-----------------------------------------------+ |
| |
| .. [3] See the ErrorHandling section in the LLVM Programmer's Manual |
| (https://llvm.org/docs/ProgrammersManual.html#error-handling) |