|  | ============================================= | 
|  | Machine Learning - Guided Optimization (MLGO) | 
|  | ============================================= | 
|  |  | 
|  | Introduction | 
|  | ============ | 
|  |  | 
|  | MLGO refers to integrating ML techniques (primarily) to replace heuristics within | 
|  | LLVM with machine learned models. | 
|  |  | 
|  | Currently the following heuristics feature such integration: | 
|  |  | 
|  | * Inlining for size | 
|  | * Register allocation (LLVM greedy eviction heuristic) for performance | 
|  |  | 
|  | This document is an outline of the tooling and APIs facilitating MLGO. | 
|  |  | 
|  | .. note:: | 
|  |  | 
|  | The tools for orchestrating ML training are not part of LLVM, as they are | 
|  | dependency-heavy - both on the ML infrastructure choice, as well as choices of | 
|  | distributed computing. For the training scenario, LLVM only contains facilities | 
|  | enabling it, such as corpus extraction, training data extraction, and evaluation | 
|  | of models during training. | 
|  |  | 
|  |  | 
|  | .. contents:: | 
|  |  | 
|  | Corpus Tooling | 
|  | ============== | 
|  |  | 
|  | Within the LLVM monorepo, there is the ``mlgo-utils`` python packages that | 
|  | lives at ``llvm/utils/mlgo-utils``. This package primarily contains tooling | 
|  | for working with corpora, or collections of LLVM bitcode. We use these corpora | 
|  | to train and evaluate ML models. Corpora consist of a description in JSON | 
|  | format at ``corpus_description.json`` in the root of the corpus, and then | 
|  | a bitcode file and command line flags file for each extracted module. The | 
|  | corpus structure is designed to contain sufficient information to fully | 
|  | compile the bitcode to bit-identical object files. | 
|  |  | 
|  | .. program:: extract_ir.py | 
|  |  | 
|  | Synopsis | 
|  | -------- | 
|  |  | 
|  | Extracts a corpus from some form of a structured compilation database. This | 
|  | tool supports a variety of different scenarios and input types. | 
|  |  | 
|  | Options | 
|  | ------- | 
|  |  | 
|  | .. option:: --input | 
|  |  | 
|  | The path to the input. This should be a path to a supported structured | 
|  | compilation database. Currently only ``compile_commands.json`` files, linker | 
|  | parameter files, a directory containing object files (for the local | 
|  | ThinLTO case only), or a JSON file containing a bazel aquery result are | 
|  | supported. | 
|  |  | 
|  | .. option:: --input_type | 
|  |  | 
|  | The type of input that has been passed to the ``--input`` flag. | 
|  |  | 
|  | .. option:: --output_dir | 
|  |  | 
|  | The output directory to place the corpus in. | 
|  |  | 
|  | .. option:: --num_workers | 
|  |  | 
|  | The number of workers to use for extracting bitcode into the corpus. This | 
|  | defaults to the number of hardware threads available on the host system. | 
|  |  | 
|  | .. option:: --llvm_objcopy_path | 
|  |  | 
|  | The path to the llvm-objcopy binary to use when extracting bitcode. | 
|  |  | 
|  | .. option:: --obj_base_dir | 
|  |  | 
|  | The base directory for object files. Bitcode files that get extracted into | 
|  | the corpus will be placed into the output directory based on where their | 
|  | source object files are placed relative to this path. | 
|  |  | 
|  | .. option:: --cmd_filter | 
|  |  | 
|  | Allows filtering of modules by command line. If set, only modules that much | 
|  | the filter will be extracted into the corpus. Regular expressions are | 
|  | supported in some instances. | 
|  |  | 
|  | .. option:: --thinlto_build | 
|  |  | 
|  | If the build was performed with ThinLTO, this should be set to either | 
|  | ``distributed`` or ``local`` depending upon how the build was performed. | 
|  |  | 
|  | .. option:: --cmd_section_name | 
|  |  | 
|  | This flag allows specifying the command line section name. This is needed | 
|  | on non-ELF platforms where the section name might differ. | 
|  |  | 
|  | .. option:: --bitcode_section_name | 
|  |  | 
|  | This flag allows specifying the bitcode section name. This is needed on | 
|  | non-ELF platforms where the section name might differ. | 
|  |  | 
|  | Example: CMake | 
|  | -------------- | 
|  |  | 
|  | CMake can output a ``compilation_commands.json`` compilation database if the | 
|  | ``CMAKE_EXPORT_COMPILE_COMMANDS`` switch is turned on at compile time. It is | 
|  | also necessary to enable bitcode embedding (done by passing | 
|  | ``-Xclang -fembed-bitcode=all`` to all C/C++ compilation actions in the | 
|  | non-ThinLTO case). For example, to extract a corpus from clang, you would | 
|  | run the following commands (assuming that the system C/C++ compiler is clang): | 
|  |  | 
|  | .. code-block:: bash | 
|  |  | 
|  | cmake -GNinja \ | 
|  | -DCMAKE_BUILD_TYPE=Release \ | 
|  | -DCMAKE_EXPORT_COMPILE_COMMANDS=ON \ | 
|  | -DCMAKE_C_FLAGS="-Xclang -fembed-bitcode=all" \ | 
|  | -DCMAKE_CXX_FLAGS="-Xclang -fembed-bitcode-all" | 
|  | ../llvm | 
|  | ninja | 
|  |  | 
|  | After running CMake and building the project, there should be a | 
|  | ``compilation_commands.json`` file within the build directory. You can then | 
|  | run the following command to create a corpus: | 
|  |  | 
|  | .. code-block:: bash | 
|  |  | 
|  | python3 ./extract_ir.py \ | 
|  | --input=./build/compile_commands.json \ | 
|  | --input_type=json \ | 
|  | --output_dir=./corpus | 
|  |  | 
|  | After running the above command, there should be a full | 
|  | corpus of bitcode within the ``./corpus`` directory. | 
|  |  | 
|  | Example: Bazel Aquery | 
|  | --------------------- | 
|  |  | 
|  | This tool also supports extracting bitcode from bazel in multiple ways | 
|  | depending upon the exact configuration. For ThinLTO, a linker parameters file | 
|  | is preferred. For the non-ThinLTO case, the script will accept the output of | 
|  | ``bazel aquery`` which it will use to find all the object files that are linked | 
|  | into a specific target and then extract bitcode from them. First, you need | 
|  | to generate the aquery output: | 
|  |  | 
|  | .. code-block:: bash | 
|  |  | 
|  | bazel aquery --output=jsonproto //path/to:target > /path/to/aquery.json | 
|  |  | 
|  | Afterwards, assuming that the build is already complete, you can run this | 
|  | script to create a corpus: | 
|  |  | 
|  | .. code-block:: bash | 
|  |  | 
|  | python3 ./extract_ir.py \ | 
|  | --input=/path/to/aquery.json \ | 
|  | --input_type=bazel_aqeury \ | 
|  | --output_dir=./corpus \ | 
|  | --obj_base_dir=./bazel-bin | 
|  |  | 
|  | This will again leave a corpus that contains all the bitcode files. This mode | 
|  | does not capture all object files in the build however, only the ones that | 
|  | are involved in the link for the binary passed to the ``bazel aquery`` | 
|  | invocation. | 
|  |  | 
|  | .. program:: make_corpus.py | 
|  |  | 
|  | Synopsis | 
|  | -------- | 
|  |  | 
|  | Creates a corpus from a collection of bitcode files. | 
|  |  | 
|  | Options | 
|  | ------- | 
|  |  | 
|  | .. option:: --input_dir | 
|  |  | 
|  | The input directory to search for bitcode files in. | 
|  |  | 
|  | .. option:: --output_dir | 
|  |  | 
|  | The output directory to place the constructed corpus in. | 
|  |  | 
|  | .. option:: --default_args | 
|  |  | 
|  | A list of space separated flags that are put into the corpus description. | 
|  | These are used by some tooling when compiling the modules within the corpus. | 
|  |  | 
|  | .. program:: combine_training_corpus.py | 
|  |  | 
|  | Synopsis | 
|  | -------- | 
|  |  | 
|  | Combines two training corpora that share the same parent folder by generating | 
|  | a new ``corpus_description.json`` that contains all the modules in both corpora. | 
|  |  | 
|  | Options | 
|  | ------- | 
|  |  | 
|  | .. option:: --root_dir | 
|  |  | 
|  | The root directory that contains subfolders consisting of the corpora that | 
|  | should be combined. | 
|  |  | 
|  | Interacting with ML models | 
|  | ========================== | 
|  |  | 
|  | We interact with ML models in 2 primary scenarios: one is to train such a model. | 
|  | The other, inference, is to use a model during compilation, to make optimization | 
|  | decisions. | 
|  |  | 
|  | For a specific optimization problem - i.e. inlining, or regalloc eviction - we | 
|  | first separate correctness - preserving decisions from optimization decisions. | 
|  | For example, not inlining functions marked "no inline" is an example of the | 
|  | former. Same is not evicting an unevictable live range. An example of the latter | 
|  | is deciding to inline a function that will bloat the caller size, just because | 
|  | we have reason to believe that later, the effect will be some constant | 
|  | propagation that will actually reduce the size (or dynamic instruction count). | 
|  |  | 
|  | ML models can be understood as functions. Their inputs are tensors - buffers of | 
|  | scalars. The output (in our case, singular) is a scalar. For example, for | 
|  | inlining, the inputs are properties of the caller, callee, and the callsite | 
|  | being analyzed for inlining. The output is a boolean. | 
|  |  | 
|  | Inputs and outputs are named, have a scalar type (e.g. int32_t) and a shape | 
|  | (e.g. 3x4). These are the elements that we use to bind to a ML model. | 
|  |  | 
|  | In both training and inference, we want to expose to ML (training algorithms or | 
|  | trained model, respectively) the features we want to make optimization | 
|  | decisions on. In that regard, the interface from the compiler side to the ML | 
|  | side is the same: pass features, and get a decision. It's essentially a function | 
|  | call, where the parameters and result are bound by name and are described by | 
|  | name, scalar type, and shape tuples. | 
|  |  | 
|  | The main types in LLVM are: | 
|  |  | 
|  | - ``MLModelRunner`` - an abstraction for the decision making mechanism | 
|  | - ``TensorSpec`` which describes a tensor. | 
|  |  | 
|  | TensorSpec | 
|  | ---------- | 
|  |  | 
|  | See ``llvm/Analysis/TensorSpec.h``. This is a simple data bag, identifying a | 
|  | tensor by name (a string), scalar type, and shape (a vector of ints). The scalar | 
|  | type can only be int (8, 16, 32, or 64), signed or unsigned; float; or double. | 
|  |  | 
|  | MLModelRunner | 
|  | ------------- | 
|  |  | 
|  | See ``llvm/Analysis/MLModelRunner.h``. The abstraction has a pure virtual, | 
|  | ``evaluateUntyped``, but the contract with implementers is a bit more involved: | 
|  |  | 
|  | Implementers | 
|  | ^^^^^^^^^^^^ | 
|  |  | 
|  | At construction, the implementer is expected to receive a list of ``TensorSpec`` | 
|  | for input features and the ``TensorSpec`` of the output (e.g. | 
|  | ``std::vector<TensorSpec>``). The list type is not contractual, but it must be | 
|  | a 0-based indexing array-like container. Given a ``TensorSpec`` at index "I" in | 
|  | the input list, that has a name "N", shape "D1 x D2x ... Dn", and scalar type | 
|  | "T", the implementer must: | 
|  |  | 
|  | - set up a contiguous buffer sized ``sizeof(T) * D1 * D2 * ... * Dn``. This | 
|  | buffer's lifetime must be the same as the lifetime of the implementer object. | 
|  | - call ``MLModelRunner::setUpBufferForTensor`` passing I, the ``TensorSpec``, | 
|  | and the buffer above. | 
|  |  | 
|  | Internally, the expectation is that the implementer uses the name (and maybe | 
|  | shape) of a ``TensorSpec`` for binding (e.g. lookup in an underlying ML model). | 
|  |  | 
|  | ``MLModelRunner::setUpBufferForTensor`` stores each buffer at the corresponding | 
|  | index (i.e. its position in the list used at construction). The expectation is | 
|  | that the user will use that position when calling ``MLModelRunner::getTensor`` | 
|  | to retrieve the underlying buffer (more on that in a bit). | 
|  |  | 
|  | The implementation of ``evaluateUntyped`` is expected to use the value in the | 
|  | buffers described above, carry out whatever computation (e.g. evaluate a ML | 
|  | model) and then place the outcome in an output buffer which will be returned to | 
|  | the caller. Importantly, ``evaluateUntyped`` must not reset the input buffers. | 
|  | This is because during training we may want to log the features and decisions, | 
|  | and since the data is already buffered, there's no reason to force backing it | 
|  | up elsewhere. | 
|  |  | 
|  | Users | 
|  | ^^^^^ | 
|  |  | 
|  | The users must pass the input ``TensorSpec`` list at the construction of a | 
|  | specific ``MLModelRunner`` object. After that, users can be agnostic of the | 
|  | specific implementation, and would typically follow the following workflow: | 
|  |  | 
|  | - call ``getTensor`` or ``getTensorUntyped``, for each input tensor, identified | 
|  | by its index (i.e. the index of the corresponding ``TensorSpec`` in the list | 
|  | used at construction). | 
|  | - populate the tensor buffer of each input tensor with values. Users can take | 
|  | advantage of the stability of the tensor buffers like set only once those that | 
|  | don't change, or cache the buffer address | 
|  | - call ``evaluate`` and use its result. | 
|  |  | 
|  | Versioning | 
|  | ^^^^^^^^^^ | 
|  |  | 
|  | We support a model "knowing" less inputs than the compiler. This is supported by | 
|  | ``MLModelRunner::setUpBufferForTensor``. If a ``TensorSpec`` requested by the | 
|  | compiler is not supported by the underlying model, the ``MLModelRunner`` | 
|  | implementer must still call ``setUpBufferForTensor`` with a ``nullptr`` value | 
|  | for the buffer. In turn, ``MLModelRunner`` will allocate an appropriately - sized | 
|  | buffer and track its lifetime. The user can safely populate that buffer. Since | 
|  | the rest of the inputs are still provided, this allows an evolution model where | 
|  | we first add features to the compiler and continue using older models without | 
|  | regressing. Then, the new compiler can be used to train new models. Deprecating | 
|  | features in the compiler involves, then, training first a model without those | 
|  | features. | 
|  |  | 
|  | ``MLModelRunner`` implementations | 
|  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
|  |  | 
|  | We currently feature 4 implementations: | 
|  |  | 
|  | - ``ModelUnderTrainingRunner``. This requires the compiler be built with TFLite | 
|  | support. It allows loading a TFLite model dynamically and is primarily | 
|  | intended for training scenarios, but it can be used relatively easily in | 
|  | production build environments, as it does not change how the compiler operates | 
|  | (why this remark is necessary will become clear in a few paragraphs) | 
|  |  | 
|  | - ``ReleaseModeModelRunner``. This is intended for inference scenarios. This | 
|  | uses the rules defined in ``llvm/cmake/modules/TensorFlowCompile.cmake`` to | 
|  | convert, at the time the compiler is built, TensorFlow Saved Models into a | 
|  | header (.h) and native object (.o). The latter is a CPU-based implementation of | 
|  | the neural network, together with its weights (essentially, loops performing | 
|  | matrix multiplications) | 
|  |  | 
|  | .. note:: | 
|  |  | 
|  | we are actively working on replacing this with an EmitC implementation | 
|  | requiring no out of tree build-time dependencies. | 
|  |  | 
|  | - ``InteractiveModelRunner``. This is intended for training scenarios where the | 
|  | training algorithm drives compilation. This model runner has no special | 
|  | dependencies, and relies on I/O pipes to communicate with a separate process, | 
|  | presumably a python training algorithm. We do not envision using this in a | 
|  | production environment. | 
|  |  | 
|  | - ``NoInferenceModelRunner``. This serves as a store for feature values, and its | 
|  | ``evaluate`` should never be called. It's used for training scenarios, when we | 
|  | want to capture the behavior of the default (non-ML) heuristic. | 
|  |  | 
|  | Note that training leaves it to the training infrastructure to handle | 
|  | distributed computing. The assumed architecture has python processes | 
|  | communicating remotely between themselves, but managing local communication with | 
|  | clang. | 
|  |  | 
|  | Logging Facility | 
|  | ---------------- | 
|  |  | 
|  | When training models, we need to expose the features we will want to use during | 
|  | inference, as well as outcomes, to guide reward-based learning techniques. This | 
|  | can happen in 2 forms: | 
|  |  | 
|  | - when running the compiler on some input, as a capture of the features and | 
|  | actions taken by some policy or a model currently being used. | 
|  | For example, see ``DevelopmentModeInlineAdvisor`` or ``DevelopmentModeEvictAdvisor`` | 
|  | in ``MLRegallocEvictAdvisor.cpp``. In more detail, in the former case, if | 
|  | ``-training-log`` is specified, the features and actions (inline/no inline) | 
|  | from each inlining decision are saved to the specified file. Since | 
|  | ``MLModelRunner`` implementations hold on to feature values (they don't get | 
|  | cleared by ``evaluate``), logging is easily supported by just looping over the | 
|  | model runner's features and passing the tensor buffers to the logger. Note how | 
|  | we use the ``NoInferenceModelRunner`` to capture the features observed when | 
|  | using the default policy. | 
|  |  | 
|  | - as a serialization mechanism for the ``InteractiveModelRunner``. Here, we need | 
|  | to pass the observed features over IPC (a file descriptor, likely a named | 
|  | pipe). | 
|  |  | 
|  | Both cases require serializing the same kind of data and we support both with | 
|  | ``Analysis/Utils/TrainingLogger``. | 
|  |  | 
|  | The goal of the logger design was avoiding any new dependency, and optimizing | 
|  | for the tensor scenario - i.e. exchanging potentially large buffers of fixed | 
|  | size, containing scalars. We explicitly assume the reader of the format has the | 
|  | same endianness as the compiler host, and we further expect the reader and the | 
|  | compiler run on the same host. This is because we expect the training scenarios | 
|  | have a (typically python) process managing the compiler process, and we leave to | 
|  | the training side to handle remoting. | 
|  |  | 
|  | The logger produces the following sequence: | 
|  |  | 
|  | - a header describing the structure of the log. This is a one-line textual JSON | 
|  | dictionary with the following elements: | 
|  |  | 
|  | - ``features``: a list of JSON-serialized ``TensorSpec`` values. The position | 
|  | in the list matters, as it will be the order in which values will be | 
|  | subsequently recorded. If we are just logging (i.e. not using the | 
|  | ``InteractiveModelRunner``), the last feature should be that of the action | 
|  | (e.g. "inline/no inline", or "index of evicted live range") | 
|  | - (optional) ``score``: a ``TensorSpec`` describing a value we will include to | 
|  | help formulate a reward. This could be a size estimate or a latency estimate. | 
|  | - (optional) ``advice``: a ``TensorSpec`` describing the action. This is used | 
|  | for the ``InteractiveModelRunner``, in which case it shouldn't be in the | 
|  | ``features`` list. | 
|  | - a sequence of ``contexts``. Contexts are independent traces of the optimization | 
|  | problem. For module passes, there is only one context, for function passes, | 
|  | there is a context per function. The start of a context is marked with a | 
|  | one-line JSON dictionary of the form ``{"context": <context name, a string>}`` | 
|  |  | 
|  | Each context has a sequence of: | 
|  |  | 
|  | - ``observations``. An observation is: | 
|  |  | 
|  | - one-line JSON ``{"observation": <observation number. 0-indexed>}`` | 
|  | - a binary dump of the tensor buffers, in the order in which they were | 
|  | specified in the header. | 
|  | - a new line character | 
|  | - if ``score`` was specified in the header: | 
|  |  | 
|  | - a one-line JSON object ``{"outcome": <value>}``, where the ``value`` | 
|  | conforms to the ``TensorSpec`` in defined for the ``score`` in the header. | 
|  | - the outcome value, as a binary dump | 
|  | - a new line character. | 
|  |  | 
|  | The format uses a mix of textual JSON (for headers) and binary dumps (for tensors) | 
|  | because the headers are not expected to dominate the payload - the tensor values | 
|  | are. We wanted to avoid overburdening the log reader - likely python - from | 
|  | additional dependencies; and the one-line JSON makes it rudimentarily possible | 
|  | to inspect a log without additional tooling. | 
|  |  | 
|  | A python utility for reading logs, used for tests, is available at | 
|  | ``Analysis/models/log_reader.py``. A utility showcasing the ``InteractiveModelRunner``, | 
|  | which uses this reader as well, is at ``Analysis/models/interactive_host.py``. | 
|  | The latter is also used in tests. | 
|  |  | 
|  | There is no C++ implementation of a log reader. We do not have a scenario | 
|  | motivating one. | 
|  |  | 
|  | IR2Vec Embeddings | 
|  | ================= | 
|  |  | 
|  | IR2Vec is a program embedding approach designed specifically for LLVM IR. It | 
|  | is implemented as a function analysis pass in LLVM. The IR2Vec embeddings | 
|  | capture syntactic, semantic, and structural properties of the IR through | 
|  | learned representations. These representations are obtained as a JSON | 
|  | vocabulary that maps the entities of the IR (opcodes, types, operands) to | 
|  | n-dimensional floating point vectors (embeddings). | 
|  |  | 
|  | With IR2Vec, representation at different granularities of IR, such as | 
|  | instructions, functions, and basic blocks, can be obtained. Representations | 
|  | of loops and regions can be derived from these representations, which can be | 
|  | useful in different scenarios. The representations can be useful for various | 
|  | downstream tasks, including ML-guided compiler optimizations. | 
|  |  | 
|  | The core components are: | 
|  | - **Vocabulary**: A mapping from IR entities (opcodes, types, etc.) to their | 
|  | vector representations. This is managed by ``IR2VecVocabAnalysis``. The | 
|  | vocabulary (.json file) contains three sections -- Opcodes, Types, and | 
|  | Arguments, each containing the representations of the corresponding | 
|  | entities. | 
|  |  | 
|  | .. note:: | 
|  |  | 
|  | It is mandatory to have these three sections present in the vocabulary file | 
|  | for it to be valid; order in which they appear does not matter. | 
|  |  | 
|  | - **Embedder**: A class (``ir2vec::Embedder``) that uses the vocabulary to | 
|  | compute embeddings for instructions, basic blocks, and functions. | 
|  |  | 
|  | Using IR2Vec | 
|  | ------------ | 
|  |  | 
|  | .. note:: | 
|  |  | 
|  | This section describes how to use IR2Vec within LLVM passes. A standalone | 
|  | tool :doc:`CommandGuide/llvm-ir2vec` is available for generating the | 
|  | embeddings and triplets from LLVM IR files, which can be useful for | 
|  | training vocabularies and generating embeddings outside of compiler passes. | 
|  |  | 
|  | For generating embeddings, first the vocabulary should be obtained. Then, the | 
|  | embeddings can be computed and accessed via an ``ir2vec::Embedder`` instance. | 
|  |  | 
|  | 1. **Get the Vocabulary**: | 
|  | In a ModulePass, get the vocabulary analysis result: | 
|  |  | 
|  | .. code-block:: c++ | 
|  |  | 
|  | auto &VocabRes = MAM.getResult<IR2VecVocabAnalysis>(M); | 
|  | if (!VocabRes.isValid()) { | 
|  | // Handle error: vocabulary is not available or invalid | 
|  | return; | 
|  | } | 
|  | const ir2vec::Vocab &Vocabulary = VocabRes.getVocabulary(); | 
|  |  | 
|  | Note that ``IR2VecVocabAnalysis`` pass is immutable. | 
|  |  | 
|  | 2. **Create Embedder instance**: | 
|  | With the vocabulary, create an embedder for a specific function: | 
|  |  | 
|  | .. code-block:: c++ | 
|  |  | 
|  | // Assuming F is an llvm::Function& | 
|  | // For example, using IR2VecKind::Symbolic: | 
|  | std::unique_ptr<ir2vec::Embedder> Emb = | 
|  | ir2vec::Embedder::create(IR2VecKind::Symbolic, F, Vocabulary); | 
|  |  | 
|  |  | 
|  | 3. **Compute and Access Embeddings**: | 
|  | Call ``getFunctionVector()`` to get the embedding for the function. | 
|  |  | 
|  | .. code-block:: c++ | 
|  |  | 
|  | const ir2vec::Embedding &FuncVector = Emb->getFunctionVector(); | 
|  |  | 
|  | Currently, ``Embedder`` can generate embeddings at three levels: Instructions, | 
|  | Basic Blocks, and Functions. Appropriate getters are provided to access the | 
|  | embeddings at these levels. | 
|  |  | 
|  | .. note:: | 
|  |  | 
|  | The validity of ``Embedder`` instance (and the embeddings it generates) is | 
|  | tied to the function it is associated with remains unchanged. If the function | 
|  | is modified, the embeddings may become stale and should be recomputed accordingly. | 
|  |  | 
|  | 4. **Working with Embeddings:** | 
|  | Embeddings are represented as ``std::vector<double>``. These | 
|  | vectors as features for machine learning models, compute similarity scores | 
|  | between different code snippets, or perform other analyses as needed. | 
|  |  | 
|  | Further Details | 
|  | --------------- | 
|  |  | 
|  | For more detailed information about the IR2Vec algorithm, its parameters, and | 
|  | advanced usage, please refer to the original paper: | 
|  | `IR2Vec: LLVM IR Based Scalable Program Embeddings <https://doi.org/10.1145/3418463>`_. | 
|  |  | 
|  | For information about using IR2Vec tool for generating embeddings and | 
|  | triplets from LLVM IR, see :doc:`CommandGuide/llvm-ir2vec`. | 
|  |  | 
|  | The LLVM source code for ``IR2Vec`` can also be explored to understand the | 
|  | implementation details. | 
|  |  | 
|  | Building with ML support | 
|  | ======================== | 
|  |  | 
|  | .. note:: | 
|  |  | 
|  | For up to date information on custom builds, see the ``ml-*`` | 
|  | `build bots <http://lab.llvm.org>`_. They are set up using | 
|  | `like this <https://github.com/google/ml-compiler-opt/blob/main/buildbot/buildbot_init.sh>`_. | 
|  |  | 
|  | Embed pre-trained models (aka "release" mode) | 
|  | --------------------------------------------- | 
|  |  | 
|  | This supports the ``ReleaseModeModelRunner`` model runners. | 
|  |  | 
|  | You need a tensorflow pip package for the AOT (ahead-of-time) Saved Model compiler | 
|  | and a thin wrapper for the native function generated by it. We currently support | 
|  | TF 2.15. We recommend using a python virtual env (in which case, remember to | 
|  | pass ``-DPython3_ROOT_DIR`` to ``cmake``). | 
|  |  | 
|  | Once you install the pip package, find where it was installed: | 
|  |  | 
|  | .. code-block:: console | 
|  |  | 
|  | TF_PIP=$(sudo -u buildbot python3 -c "import tensorflow as tf; import os; print(os.path.dirname(tf.__file__))")`` | 
|  |  | 
|  | Then build LLVM: | 
|  |  | 
|  | .. code-block:: console | 
|  |  | 
|  | cmake -DTENSORFLOW_AOT_PATH=$TF_PIP \ | 
|  | -DLLVM_INLINER_MODEL_PATH=<path to inliner saved model dir> \ | 
|  | -DLLVM_RAEVICT_MODEL_PATH=<path to regalloc eviction saved model dir> \ | 
|  | <...other options...> | 
|  |  | 
|  | The example shows the flags for both inlining and regalloc, but either may be | 
|  | omitted. | 
|  |  | 
|  | You can also specify a URL for the path, and it is also possible to pre-compile | 
|  | the header and object and then just point to the precompiled artifacts. See for | 
|  | example ``LLVM_OVERRIDE_MODEL_HEADER_INLINERSIZEMODEL``. | 
|  |  | 
|  | .. note:: | 
|  |  | 
|  | We are transitioning away from the AOT compiler shipping with the | 
|  | tensorflow package, and to a EmitC, in-tree solution, so these details will | 
|  | change soon. | 
|  |  | 
|  | Using TFLite (aka "development" mode) | 
|  | ------------------------------------- | 
|  |  | 
|  | This supports the ``ModelUnderTrainingRunner`` model runners. | 
|  |  | 
|  | Build the TFLite package using `this script <https://raw.githubusercontent.com/google/ml-compiler-opt/refs/heads/main/buildbot/build_tflite.sh>`_. | 
|  | Then, assuming you ran that script in ``/tmp/tflitebuild``, just pass | 
|  | ``-C /tmp/tflitebuild/tflite.cmake`` to the ``cmake`` for LLVM. | 
|  |  | 
|  | Interactive Mode (for training / research) | 
|  | ------------------------------------------ | 
|  |  | 
|  | The ``InteractiveModelRunner`` is available with no extra dependencies. For the | 
|  | optimizations that are currently MLGO-enabled, it may be used as follows: | 
|  |  | 
|  | - for inlining: ``-mllvm -enable-ml-inliner=release -mllvm -inliner-interactive-channel-base=<name>`` | 
|  | - for regalloc eviction: ``-mllvm -regalloc-evict-advisor=release -mllvm -regalloc-evict-interactive-channel-base=<name>`` | 
|  |  | 
|  | where the ``name`` is a path fragment. We will expect to find 2 files, | 
|  | ``<name>.in`` (readable, data incoming from the managing process) and | 
|  | ``<name>.out`` (writable, the model runner sends data to the managing process) |