| //===- README.txt - Information about the X86 backend and related files ---===// |
| // |
| // This file contains random notes and points of interest about the X86 backend. |
| // |
| //===----------------------------------------------------------------------===// |
| |
| =========== |
| I. Overview |
| =========== |
| |
| This directory contains a machine description for the X86 processor. Currently |
| this machine description is used for a high performance code generator used by a |
| LLVM JIT. One of the main objectives that we would like to support with this |
| project is to build a nice clean code generator that may be extended in the |
| future in a variety of ways: new targets, new optimizations, new |
| transformations, etc. |
| |
| This document describes the current state of the LLVM JIT, along with |
| implementation notes, design decisions, and other stuff. |
| |
| |
| =================================== |
| II. Architecture / Design Decisions |
| =================================== |
| |
| We designed the infrastructure into the generic LLVM machine specific |
| representation, which allows us to support as many targets as possible with our |
| framework. This framework should allow us to share many common machine specific |
| transformations (register allocation, instruction scheduling, etc...) among all |
| of the backends that may eventually be supported by LLVM, and ensures that the |
| JIT and static compiler backends are largely shared. |
| |
| At the high-level, LLVM code is translated to a machine specific representation |
| formed out of MachineFunction, MachineBasicBlock, and MachineInstr instances |
| (defined in include/llvm/CodeGen). This representation is completely target |
| agnostic, representing instructions in their most abstract form: an opcode, a |
| destination, and a series of operands. This representation is designed to |
| support both SSA representation for machine code, as well as a register |
| allocated, non-SSA form. |
| |
| Because the Machine* representation must work regardless of the target machine, |
| it contains very little semantic information about the program. To get semantic |
| information about the program, a layer of Target description datastructures are |
| used, defined in include/llvm/Target. |
| |
| Note that there is some amount of complexity that the X86 backend contains due |
| to the Sparc backend's legacy requirements. These should eventually fade away |
| as the project progresses. |
| |
| |
| SSA Instruction Representation |
| ------------------------------ |
| Target machine instructions are represented as instances of MachineInstr, and |
| all specific machine instruction types should have an entry in the |
| InstructionInfo table defined through X86InstrInfo.def. In the X86 backend, |
| there are two particularly interesting forms of machine instruction: those that |
| produce a value (such as add), and those that do not (such as a store). |
| |
| Instructions that produce a value use Operand #0 as the "destination" register. |
| When printing the assembly code with the built-in machine instruction printer, |
| these destination registers will be printed to the left side of an '=' sign, as |
| in: %reg1027 = addl %reg1026, %reg1025 |
| |
| This 'addl' MachineInstruction contains three "operands": the first is the |
| destination register (#1027), the second is the first source register (#1026) |
| and the third is the second source register (#1025). Never forget the |
| destination register will show up in the MachineInstr operands vector. The code |
| to generate this instruction looks like this: |
| |
| BuildMI(BB, X86::ADDrr32, 2, 1027).addReg(1026).addReg(1025); |
| |
| The first argument to BuildMI is the basic block to append the machine |
| instruction to, the second is the opcode, the third is the number of operands, |
| the fourth is the destination register. The two addReg calls specify operands |
| in order. |
| |
| MachineInstrs that do not produce a value do not have this implicit first |
| operand, they simply have #operands = #uses. To create them, simply do not |
| specify a destination register to the BuildMI call. |
| |
| |
| ====================== |
| IV. Source Code Layout |
| ====================== |
| |
| The LLVM-JIT is composed of source files primarily in the following locations: |
| |
| include/llvm/CodeGen |
| -------------------- |
| This directory contains header files that are used to represent the program in a |
| machine specific representation. It currently also contains a bunch of stuff |
| used by the Sparc backend that we don't want to get mixed up in, such as |
| register allocation internals. |
| |
| include/llvm/Target |
| ------------------- |
| This directory contains header files that are used to interpret the machine |
| specific representation of the program. This allows us to write generic |
| transformations that will work on any target that implements the interfaces |
| defined in this directory. The only classes used by the X86 backend so far are |
| the TargetMachine, TargetData, MachineInstrInfo, and MRegisterInfo classes. |
| |
| lib/CodeGen |
| ----------- |
| This directory will contain all of the target independent transformations (for |
| example, register allocation) that we write. These transformations should only |
| use information exposed through the Target interface, they should not include |
| any target specific header files. |
| |
| lib/Target/X86 |
| -------------- |
| This directory contains the machine description for X86 that is required to the |
| rest of the compiler working. It contains any code that is truly specific to |
| the X86 backend, for example the instruction selector and machine code emitter. |
| |
| tools/lli/JIT |
| ------------- |
| This directory contains the top-level code for the JIT compiler. This code |
| basically boils down to a call to TargetMachine::addPassesToJITCompile. As we |
| progress with the project, this will also contain the compile-dispatch-recompile |
| loop. |
| |
| test/Regression/Jello |
| --------------------- |
| This directory contains regression tests for the JIT. |
| |
| |
| ================================================== |
| V. Strange Things, or, Things That Should Be Known |
| ================================================== |
| |
| Representing memory in MachineInstrs |
| ------------------------------------ |
| |
| The x86 has a very, uhm, flexible, way of accessing memory. It is capable of |
| addressing memory addresses of the following form directly in integer |
| instructions (which use ModR/M addressing): |
| |
| Base+[1,2,4,8]*IndexReg+Disp32 |
| |
| Wow, that's crazy. In order to represent this, LLVM tracks no less that 4 |
| operands for each memory operand of this form. This means that the "load" form |
| of 'mov' has the following "Operands" in this order: |
| |
| Index: 0 | 1 2 3 4 |
| Meaning: DestReg, | BaseReg, Scale, IndexReg, Displacement |
| OperandTy: VirtReg, | VirtReg, UnsImm, VirtReg, SignExtImm |
| |
| Stores and all other instructions treat the four memory operands in the same |
| way, in the same order. |
| |
| |
| ========================== |
| VI. TODO / Future Projects |
| ========================== |
| |
| There are a large number of things remaining to do. Here is a partial list: |
| |
| Next Phase: |
| ----------- |
| 1. Implement linear time optimal instruction selector |
| 2. Implement smarter (linear scan?) register allocator |
| |
| After this project: |
| ------------------- |
| 1. Implement lots of nifty runtime optimizations |
| 2. Implement new targets: IA64? X86-64? M68k? MMIX? Who knows... |
| |
| Infrastructure Improvements: |
| ---------------------------- |
| |
| 1. Bytecode is designed to be able to read particular functions from the |
| bytecode without having to read the whole program. Bytecode reader should be |
| extended to allow on-demand loading of functions. |
| |
| 2. X86/Printer.cpp and Sparc/EmitAssembly.cpp both have copies of what is |
| roughly the same code, used to output constants in a form the assembler |
| can understand. These functions should be shared at some point. They |
| should be rewritten to pass around iostreams instead of strings. The |
| list of functions is as follows: |
| |
| isStringCompatible |
| toOctal |
| ConstantExprToString |
| valToExprString |
| getAsCString |
| printSingleConstantValue (with TypeToDataDirective inlined) |
| printConstantValueOnly |