| This folder contains an implementation of [automemcpy: A framework for automatic generation of fundamental memory operations](https://research.google/pubs/pub50338/). |
| |
| It uses the [Z3 theorem prover](https://github.com/Z3Prover/z3) to enumerate a subset of valid memory function implementations. These implementations are then materialized as C++ code and can be [benchmarked](../) against various [size distributions](../distributions). This process helps the design of efficient implementations for a particular environnement (size distribution, processor or custom compilation options). |
| |
| This is not enabled by default, as it is mostly useful when working on tuning the library implementation. To build it, use `LIBC_BUILD_AUTOMEMCPY=ON` (see below). |
| |
| ## Prerequisites |
| |
| You may need to install `Z3` from source if it's not available on your system. |
| Here we show instructions to install it into `<Z3_INSTALL_DIR>`. |
| You may need to `sudo` to `make install`. |
| |
| ```shell |
| mkdir -p ~/git |
| cd ~/git |
| git clone https://github.com/Z3Prover/z3.git |
| python scripts/mk_make.py --prefix=<Z3_INSTALL_DIR> |
| cd build |
| make -j |
| make install |
| ``` |
| |
| ## Configuration |
| |
| ```shell |
| mkdir -p <BUILD_DIR> |
| cd <LLVM_PROJECT_DIR>/llvm |
| cmake -DCMAKE_C_COMPILER=/usr/bin/clang \ |
| -DCMAKE_CXX_COMPILER=/usr/bin/clang++ \ |
| -DLLVM_ENABLE_PROJECTS="libc" \ |
| -DLLVM_ENABLE_Z3_SOLVER=ON \ |
| -DLLVM_Z3_INSTALL_DIR=<Z3_INSTALL_DIR> \ |
| -DLIBC_BUILD_AUTOMEMCPY=ON \ |
| -DCMAKE_BUILD_TYPE=Release \ |
| -B<BUILD_DIR> |
| ``` |
| |
| ## Targets and compilation |
| |
| There are three main CMake targets |
| 1. `automemcpy_implementations` |
| - runs `Z3` and materializes valid memory functions as C++ code, a message will display its ondisk location. |
| - the source code is then compiled using the native host optimizations (i.e. `-march=native` or `-mcpu=native` depending on the architecture). |
| 2. `automemcpy` |
| - the binary that benchmarks the autogenerated implementations. |
| 3. `automemcpy_result_analyzer` |
| - the binary that analyses the benchmark results. |
| |
| You may only compile the binaries as they both pull the autogenerated code as a dependency. |
| |
| ```shell |
| make -C <BUILD_DIR> -j automemcpy automemcpy_result_analyzer |
| ``` |
| |
| ## Running the benchmarks |
| |
| Make sure to save the results of the benchmark as a json file. |
| |
| ```shell |
| <BUILD_DIR>/bin/automemcpy --benchmark_out_format=json --benchmark_out=<RESULTS_DIR>/results.json |
| ``` |
| |
| ### Additional useful options |
| |
| |
| - `--benchmark_min_time=.2` |
| |
| By default, each function is benchmarked for at least one second, here we lower it to 200ms. |
| |
| - `--benchmark_filter="BM_Memset|BM_Bzero"` |
| |
| By default, all functions are benchmarked, here we restrict them to `memset` and `bzero`. |
| |
| Other options might be useful, use `--help` for more information. |
| |
| ## Analyzing the benchmarks |
| |
| Analysis is performed by running `automemcpy_result_analyzer` on one or more json result files. |
| |
| ```shell |
| <BUILD_DIR>/bin/automemcpy_result_analyzer <RESULTS_DIR>/results.json |
| ``` |
| |
| What it does: |
| 1. Gathers all throughput values for each function / distribution pair and picks the median one.\ |
| This allows picking a representative value over many runs of the benchmark. Please make sure all the runs happen under similar circumstances. |
| |
| 2. For each distribution, look at the span of throughputs for functions of the same type (e.g. For distribution `A`, memcpy throughput spans from 2GiB/s to 5GiB/s). |
| |
| 3. For each distribution, give a normalized score to each function (e.g. For distribution `A`, function `M` scores 0.65).\ |
| This score is then turned into a grade `EXCELLENT`, `VERY_GOOD`, `GOOD`, `PASSABLE`, `INADEQUATE`, `MEDIOCRE`, `BAD` - so that each distribution categorizes how function perform according to them. |
| |
| 4. A [Majority Judgement](https://en.wikipedia.org/wiki/Majority_judgment) process is then used to categorize each function. This enables finer analysis of how distributions agree on which function is better. In the following example, `Function_1` and `Function_2` are rated `EXCELLENT` but looking at the grade's distribution might help decide which is best. |
| |
| | | EXCELLENT | VERY_GOOD | GOOD | PASSABLE | INADEQUATE | MEDIOCRE | BAD | |
| |------------|:---------:|:---------:|:----:|:--------:|:----------:|:--------:|:---:| |
| | Function_1 | 7 | 1 | 2 | | | | | |
| | Function_2 | 6 | 4 | | | | | | |
| |
| The tool outputs the histogram of grades for each function. In case of tie, other dimensions might help decide (e.g. code size, performance on other microarchitectures). |
| |
| ``` |
| EXCELLENT |█▁▂ | Function_0 |
| EXCELLENT |█▅ | Function_1 |
| VERY_GOOD |▂█▁ ▁ | Function_2 |
| GOOD | ▁█▄ | Function_3 |
| PASSABLE | ▂▆▄█ | Function_4 |
| INADEQUATE | ▃▃█▁ | Function_5 |
| MEDIOCRE | █▆▁| Function_6 |
| BAD | ▁▁█| Function_7 |
| ``` |