| ===================================================================== |
| Building a JIT: Adding Optimizations -- An introduction to ORC Layers |
| ===================================================================== |
| |
| .. contents:: |
| :local: |
| |
| **This tutorial is under active development. It is incomplete and details may |
| change frequently.** Nonetheless we invite you to try it out as it stands, and |
| we welcome any feedback. |
| |
| Chapter 2 Introduction |
| ====================== |
| |
| **Warning: This tutorial is currently being updated to account for ORC API |
| changes. Only Chapters 1 and 2 are up-to-date.** |
| |
| **Example code from Chapters 3 to 5 will compile and run, but has not been |
| updated** |
| |
| Welcome to Chapter 2 of the "Building an ORC-based JIT in LLVM" tutorial. In |
| `Chapter 1 <BuildingAJIT1.html>`_ of this series we examined a basic JIT |
| class, KaleidoscopeJIT, that could take LLVM IR modules as input and produce |
| executable code in memory. KaleidoscopeJIT was able to do this with relatively |
| little code by composing two off-the-shelf *ORC layers*: IRCompileLayer and |
| ObjectLinkingLayer, to do much of the heavy lifting. |
| |
| In this layer we'll learn more about the ORC layer concept by using a new layer, |
| IRTransformLayer, to add IR optimization support to KaleidoscopeJIT. |
| |
| Optimizing Modules using the IRTransformLayer |
| ============================================= |
| |
| In `Chapter 4 <LangImpl04.html>`_ of the "Implementing a language with LLVM" |
| tutorial series the llvm *FunctionPassManager* is introduced as a means for |
| optimizing LLVM IR. Interested readers may read that chapter for details, but |
| in short: to optimize a Module we create an llvm::FunctionPassManager |
| instance, configure it with a set of optimizations, then run the PassManager on |
| a Module to mutate it into a (hopefully) more optimized but semantically |
| equivalent form. In the original tutorial series the FunctionPassManager was |
| created outside the KaleidoscopeJIT and modules were optimized before being |
| added to it. In this Chapter we will make optimization a phase of our JIT |
| instead. For now this will provide us a motivation to learn more about ORC |
| layers, but in the long term making optimization part of our JIT will yield an |
| important benefit: When we begin lazily compiling code (i.e. deferring |
| compilation of each function until the first time it's run) having |
| optimization managed by our JIT will allow us to optimize lazily too, rather |
| than having to do all our optimization up-front. |
| |
| To add optimization support to our JIT we will take the KaleidoscopeJIT from |
| Chapter 1 and compose an ORC *IRTransformLayer* on top. We will look at how the |
| IRTransformLayer works in more detail below, but the interface is simple: the |
| constructor for this layer takes a reference to the execution session and the |
| layer below (as all layers do) plus an *IR optimization function* that it will |
| apply to each Module that is added via addModule: |
| |
| .. code-block:: c++ |
| |
| class KaleidoscopeJIT { |
| private: |
| ExecutionSession ES; |
| RTDyldObjectLinkingLayer ObjectLayer; |
| IRCompileLayer CompileLayer; |
| IRTransformLayer TransformLayer; |
| |
| DataLayout DL; |
| MangleAndInterner Mangle; |
| ThreadSafeContext Ctx; |
| |
| public: |
| |
| KaleidoscopeJIT(JITTargetMachineBuilder JTMB, DataLayout DL) |
| : ObjectLayer(ES, |
| []() { return std::make_unique<SectionMemoryManager>(); }), |
| CompileLayer(ES, ObjectLayer, ConcurrentIRCompiler(std::move(JTMB))), |
| TransformLayer(ES, CompileLayer, optimizeModule), |
| DL(std::move(DL)), Mangle(ES, this->DL), |
| Ctx(std::make_unique<LLVMContext>()) { |
| ES.getMainJITDylib().addGenerator( |
| cantFail(DynamicLibrarySearchGenerator::GetForCurrentProcess(DL.getGlobalPrefix()))); |
| } |
| |
| Our extended KaleidoscopeJIT class starts out the same as it did in Chapter 1, |
| but after the CompileLayer we introduce a new member, TransformLayer, which sits |
| on top of our CompileLayer. We initialize our OptimizeLayer with a reference to |
| the ExecutionSession and output layer (standard practice for layers), along with |
| a *transform function*. For our transform function we supply our classes |
| optimizeModule static method. |
| |
| .. code-block:: c++ |
| |
| // ... |
| return cantFail(OptimizeLayer.addModule(std::move(M), |
| std::move(Resolver))); |
| // ... |
| |
| Next we need to update our addModule method to replace the call to |
| ``CompileLayer::add`` with a call to ``OptimizeLayer::add`` instead. |
| |
| .. code-block:: c++ |
| |
| static Expected<ThreadSafeModule> |
| optimizeModule(ThreadSafeModule M, const MaterializationResponsibility &R) { |
| // Create a function pass manager. |
| auto FPM = std::make_unique<legacy::FunctionPassManager>(M.get()); |
| |
| // Add some optimizations. |
| FPM->add(createInstructionCombiningPass()); |
| FPM->add(createReassociatePass()); |
| FPM->add(createGVNPass()); |
| FPM->add(createCFGSimplificationPass()); |
| FPM->doInitialization(); |
| |
| // Run the optimizations over all functions in the module being added to |
| // the JIT. |
| for (auto &F : *M) |
| FPM->run(F); |
| |
| return M; |
| } |
| |
| At the bottom of our JIT we add a private method to do the actual optimization: |
| *optimizeModule*. This function takes the module to be transformed as input (as |
| a ThreadSafeModule) along with a reference to a reference to a new class: |
| ``MaterializationResponsibility``. The MaterializationResponsibility argument |
| can be used to query JIT state for the module being transformed, such as the set |
| of definitions in the module that JIT'd code is actively trying to call/access. |
| For now we will ignore this argument and use a standard optimization |
| pipeline. To do this we set up a FunctionPassManager, add some passes to it, run |
| it over every function in the module, and then return the mutated module. The |
| specific optimizations are the same ones used in `Chapter 4 <LangImpl04.html>`_ |
| of the "Implementing a language with LLVM" tutorial series. Readers may visit |
| that chapter for a more in-depth discussion of these, and of IR optimization in |
| general. |
| |
| And that's it in terms of changes to KaleidoscopeJIT: When a module is added via |
| addModule the OptimizeLayer will call our optimizeModule function before passing |
| the transformed module on to the CompileLayer below. Of course, we could have |
| called optimizeModule directly in our addModule function and not gone to the |
| bother of using the IRTransformLayer, but doing so gives us another opportunity |
| to see how layers compose. It also provides a neat entry point to the *layer* |
| concept itself, because IRTransformLayer is one of the simplest layers that |
| can be implemented. |
| |
| .. code-block:: c++ |
| |
| // From IRTransformLayer.h: |
| class IRTransformLayer : public IRLayer { |
| public: |
| using TransformFunction = std::function<Expected<ThreadSafeModule>( |
| ThreadSafeModule, const MaterializationResponsibility &R)>; |
| |
| IRTransformLayer(ExecutionSession &ES, IRLayer &BaseLayer, |
| TransformFunction Transform = identityTransform); |
| |
| void setTransform(TransformFunction Transform) { |
| this->Transform = std::move(Transform); |
| } |
| |
| static ThreadSafeModule |
| identityTransform(ThreadSafeModule TSM, |
| const MaterializationResponsibility &R) { |
| return TSM; |
| } |
| |
| void emit(MaterializationResponsibility R, ThreadSafeModule TSM) override; |
| |
| private: |
| IRLayer &BaseLayer; |
| TransformFunction Transform; |
| }; |
| |
| // From IRTransformLayer.cpp: |
| |
| IRTransformLayer::IRTransformLayer(ExecutionSession &ES, |
| IRLayer &BaseLayer, |
| TransformFunction Transform) |
| : IRLayer(ES), BaseLayer(BaseLayer), Transform(std::move(Transform)) {} |
| |
| void IRTransformLayer::emit(MaterializationResponsibility R, |
| ThreadSafeModule TSM) { |
| assert(TSM.getModule() && "Module must not be null"); |
| |
| if (auto TransformedTSM = Transform(std::move(TSM), R)) |
| BaseLayer.emit(std::move(R), std::move(*TransformedTSM)); |
| else { |
| R.failMaterialization(); |
| getExecutionSession().reportError(TransformedTSM.takeError()); |
| } |
| } |
| |
| This is the whole definition of IRTransformLayer, from |
| ``llvm/include/llvm/ExecutionEngine/Orc/IRTransformLayer.h`` and |
| ``llvm/lib/ExecutionEngine/Orc/IRTransformLayer.cpp``. This class is concerned |
| with two very simple jobs: (1) Running every IR Module that is emitted via this |
| layer through the transform function object, and (2) implementing the ORC |
| ``IRLayer`` interface (which itself conforms to the general ORC Layer concept, |
| more on that below). Most of the class is straightforward: a typedef for the |
| transform function, a constructor to initialize the members, a setter for the |
| transform function value, and a default no-op transform. The most important |
| method is ``emit`` as this is half of our IRLayer interface. The emit method |
| applies our transform to each module that it is called on and, if the transform |
| succeeds, passes the transformed module to the base layer. If the transform |
| fails, our emit function calls |
| ``MaterializationResponsibility::failMaterialization`` (this JIT clients who |
| may be waiting on other threads know that the code they were waiting for has |
| failed to compile) and logs the error with the execution session before bailing |
| out. |
| |
| The other half of the IRLayer interface we inherit unmodified from the IRLayer |
| class: |
| |
| .. code-block:: c++ |
| |
| Error IRLayer::add(JITDylib &JD, ThreadSafeModule TSM, VModuleKey K) { |
| return JD.define(std::make_unique<BasicIRLayerMaterializationUnit>( |
| *this, std::move(K), std::move(TSM))); |
| } |
| |
| This code, from ``llvm/lib/ExecutionEngine/Orc/Layer.cpp``, adds a |
| ThreadSafeModule to a given JITDylib by wrapping it up in a |
| ``MaterializationUnit`` (in this case a ``BasicIRLayerMaterializationUnit``). |
| Most layers that derived from IRLayer can rely on this default implementation |
| of the ``add`` method. |
| |
| These two operations, ``add`` and ``emit``, together constitute the layer |
| concept: A layer is a way to wrap a part of a compiler pipeline (in this case |
| the "opt" phase of an LLVM compiler) whose API is opaque to ORC with an |
| interface that ORC can call as needed. The add method takes an |
| module in some input program representation (in this case an LLVM IR module) |
| and stores it in the target ``JITDylib``, arranging for it to be passed back |
| to the layer's emit method when any symbol defined by that module is requested. |
| Each layer can complete its own work by calling the ``emit`` method of its base |
| layer. For example, in this tutorial our IRTransformLayer calls through to |
| our IRCompileLayer to compile the transformed IR, and our IRCompileLayer in |
| turn calls our ObjectLayer to link the object file produced by our compiler. |
| |
| So far we have learned how to optimize and compile our LLVM IR, but we have |
| not focused on when compilation happens. Our current REPL optimizes and |
| compiles each function as soon as it is referenced by any other code, |
| regardless of whether it is ever called at runtime. In the next chapter we |
| will introduce a fully lazy compilation, in which functions are not compiled |
| until they are first called at run-time. At this point the trade-offs get much |
| more interesting: the lazier we are, the quicker we can start executing the |
| first function, but the more often we will have to pause to compile newly |
| encountered functions. If we only code-gen lazily, but optimize eagerly, we |
| will have a longer startup time (as everything is optimized at that time) but |
| relatively short pauses as each function just passes through code-gen. If we |
| both optimize and code-gen lazily we can start executing the first function |
| more quickly, but we will have longer pauses as each function has to be both |
| optimized and code-gen'd when it is first executed. Things become even more |
| interesting if we consider interprocedural optimizations like inlining, which |
| must be performed eagerly. These are complex trade-offs, and there is no |
| one-size-fits all solution to them, but by providing composable layers we leave |
| the decisions to the person implementing the JIT, and make it easy for them to |
| experiment with different configurations. |
| |
| `Next: Adding Per-function Lazy Compilation <BuildingAJIT3.html>`_ |
| |
| Full Code Listing |
| ================= |
| |
| Here is the complete code listing for our running example with an |
| IRTransformLayer added to enable optimization. To build this example, use: |
| |
| .. code-block:: bash |
| |
| # Compile |
| clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orcjit native` -O3 -o toy |
| # Run |
| ./toy |
| |
| Here is the code: |
| |
| .. literalinclude:: ../../examples/Kaleidoscope/BuildingAJIT/Chapter2/KaleidoscopeJIT.h |
| :language: c++ |