| ================================== |
| Benchmarking tips |
| ================================== |
| |
| |
| Introduction |
| ============ |
| |
| For benchmarking a patch we want to reduce all possible sources of |
| noise as much as possible. How to do that is very OS dependent. |
| |
| Note that low noise is required, but not sufficient. It does not |
| exclude measurement bias. |
| See `"Producing Wrong Data Without Doing Anything Obviously Wrong!" by Mytkowicz, Diwan, Hauswith and Sweeney (ASPLOS 2009) <https://users.cs.northwestern.edu/~robby/courses/322-2013-spring/mytkowicz-wrong-data.pdf>`_ |
| for example. |
| |
| General |
| ================================ |
| |
| * Use a high-resolution timer, e.g., perf under Linux. |
| |
| * Run the benchmark multiple times to be able to recognize noise. |
| |
| * Disable as many processes or services as possible on the target system. |
| |
| * Disable frequency scaling, Turbo Boost and address space |
| randomization (see OS-specific section). |
| |
| * Use static linking if the OS supports it. That avoids any variation that |
| might be introduced by loading dynamic libraries. This can be done |
| by passing ``-DLLVM_BUILD_STATIC=ON`` to CMake. |
| |
| * Try to avoid storage. On some systems, you can use tmpfs. Putting the |
| program, inputs and outputs on tmpfs avoids touching a real storage |
| system, which can have a pretty big variability. |
| |
| To mount it (on Linux and FreeBSD at least):: |
| |
| mount -t tmpfs -o size=<XX>g none dir_to_mount |
| |
| Linux |
| ===== |
| |
| * Disable address space randomization:: |
| |
| echo 0 > /proc/sys/kernel/randomize_va_space |
| |
| * Set scaling_governor to performance:: |
| |
| for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor |
| do |
| echo performance > $i |
| done |
| |
| * Use https://github.com/lpechacek/cpuset to reserve CPU cores for just the |
| program you are benchmarking. If using perf, leave at least 2 cores |
| so that perf runs in one and your program in another:: |
| |
| cset shield -c N1,N2 -k on |
| |
| This will move all threads out of N1 and N2. The ``-k on`` means |
| that even kernel threads are moved out. |
| |
| * Disable the SMT pair of the cpus you will use for the benchmark. The |
| pair of cpu N can be found in |
| ``/sys/devices/system/cpu/cpuN/topology/thread_siblings_list`` and |
| disabled with:: |
| |
| echo 0 > /sys/devices/system/cpu/cpuX/online |
| |
| |
| * Run the program with:: |
| |
| cset shield --exec -- perf stat -r 10 <cmd> |
| |
| This will run the command after ``--`` in the isolated CPU cores. The |
| particular perf command runs the ``<cmd>`` 10 times and reports |
| statistics. |
| |
| With these in place you can expect perf variations of less than 0.1%. |
| |
| Linux Intel |
| ----------- |
| |
| * Disable Turbo Boost:: |
| |
| echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo |