Rafael Espindola | 703e2db | 2017-05-24 16:39:12 +0000 | [diff] [blame] | 1 | ================================== |
| 2 | Benchmarking tips |
| 3 | ================================== |
| 4 | |
| 5 | |
| 6 | Introduction |
| 7 | ============ |
| 8 | |
| 9 | For benchmarking a patch we want to reduce all possible sources of |
| 10 | noise as much as possible. How to do that is very OS dependent. |
| 11 | |
| 12 | Note that low noise is required, but not sufficient. It does not |
Youngsuk Kim | 63daa5e | 2024-04-09 17:06:41 -0400 | [diff] [blame] | 13 | exclude measurement bias. |
| 14 | See `"Producing Wrong Data Without Doing Anything Obviously Wrong!" by Mytkowicz, Diwan, Hauswith and Sweeney (ASPLOS 2009) <https://users.cs.northwestern.edu/~robby/courses/322-2013-spring/mytkowicz-wrong-data.pdf>`_ |
| 15 | for example. |
Rafael Espindola | 703e2db | 2017-05-24 16:39:12 +0000 | [diff] [blame] | 16 | |
| 17 | General |
| 18 | ================================ |
| 19 | |
| 20 | * Use a high resolution timer, e.g. perf under linux. |
| 21 | |
| 22 | * Run the benchmark multiple times to be able to recognize noise. |
| 23 | |
| 24 | * Disable as many processes or services as possible on the target system. |
| 25 | |
| 26 | * Disable frequency scaling, turbo boost and address space |
| 27 | randomization (see OS specific section). |
| 28 | |
| 29 | * Static link if the OS supports it. That avoids any variation that |
| 30 | might be introduced by loading dynamic libraries. This can be done |
| 31 | by passing ``-DLLVM_BUILD_STATIC=ON`` to cmake. |
| 32 | |
| 33 | * Try to avoid storage. On some systems you can use tmpfs. Putting the |
| 34 | program, inputs and outputs on tmpfs avoids touching a real storage |
| 35 | system, which can have a pretty big variability. |
| 36 | |
| 37 | To mount it (on linux and freebsd at least):: |
| 38 | |
| 39 | mount -t tmpfs -o size=<XX>g none dir_to_mount |
| 40 | |
| 41 | Linux |
| 42 | ===== |
| 43 | |
| 44 | * Disable address space randomization:: |
| 45 | |
| 46 | echo 0 > /proc/sys/kernel/randomize_va_space |
| 47 | |
| 48 | * Set scaling_governor to performance:: |
| 49 | |
| 50 | for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor |
| 51 | do |
uint256_t | c54a070 | 2024-06-07 17:05:50 +0900 | [diff] [blame] | 52 | echo performance > $i |
Rafael Espindola | 703e2db | 2017-05-24 16:39:12 +0000 | [diff] [blame] | 53 | done |
| 54 | |
| 55 | * Use https://github.com/lpechacek/cpuset to reserve cpus for just the |
| 56 | program you are benchmarking. If using perf, leave at least 2 cores |
| 57 | so that perf runs in one and your program in another:: |
| 58 | |
| 59 | cset shield -c N1,N2 -k on |
| 60 | |
| 61 | This will move all threads out of N1 and N2. The ``-k on`` means |
| 62 | that even kernel threads are moved out. |
| 63 | |
| 64 | * Disable the SMT pair of the cpus you will use for the benchmark. The |
| 65 | pair of cpu N can be found in |
| 66 | ``/sys/devices/system/cpu/cpuN/topology/thread_siblings_list`` and |
| 67 | disabled with:: |
| 68 | |
| 69 | echo 0 > /sys/devices/system/cpu/cpuX/online |
| 70 | |
| 71 | |
| 72 | * Run the program with:: |
| 73 | |
| 74 | cset shield --exec -- perf stat -r 10 <cmd> |
| 75 | |
| 76 | This will run the command after ``--`` in the isolated cpus. The |
| 77 | particular perf command runs the ``<cmd>`` 10 times and reports |
| 78 | statistics. |
| 79 | |
| 80 | With these in place you can expect perf variations of less than 0.1%. |
| 81 | |
| 82 | Linux Intel |
| 83 | ----------- |
| 84 | |
| 85 | * Disable turbo mode:: |
| 86 | |
| 87 | echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo |