| =================================================================== |
| How To Add Your Build Configuration To LLVM Buildbot Infrastructure |
| =================================================================== |
| |
| Introduction |
| ============ |
| |
| This document contains information about adding a build configuration and |
| buildbot worker to the LLVM Buildbot Infrastructure. |
| |
| .. note:: The term "buildmaster" is used in this document to refer to the |
| server that manages which builds are run and where. Though we would not |
| normally choose to use "master" terminology, it is used in this document |
| because it is the term that the Buildbot package currently |
| `uses <https://github.com/buildbot/buildbot/issues/5382>`_. |
| |
| Buildmasters |
| ============ |
| |
| There are two buildmasters running. |
| |
| * The main buildmaster at `<https://lab.llvm.org/buildbot>`_. All builders |
| attached to this machine will notify commit authors every time they break |
| the build. |
| * The staging buildmaster at `<https://lab.llvm.org/staging>`_. All builders |
| attached to this machine will be completely silent by default when the build |
| is broken. This buildmaster is reconfigured every two hours with any new |
| commits from the llvm-zorg repository. |
| |
| In order to remain connected to the main buildmaster (and thus notify |
| developers of failures), a builbot must: |
| |
| * Be building a supported configuration. Builders for experimental backends |
| should generally be attached to staging buildmaster. |
| * Be able to keep up with new commits to the main branch, or at a minimum |
| recover to tip of tree within a couple of days of falling behind. |
| |
| Additionally, we encourage all bot owners to point their bots towards the |
| staging master during maintenance windows, instability troubleshooting, and |
| such. |
| |
| Roles & Expectations |
| ==================== |
| |
| Each buildbot has an owner who is the responsible party for addressing problems |
| which arise with said buildbot. We generally expect the bot owner to be |
| reasonably responsive. |
| |
| For some bots, the ownership responsibility is split between a "resource owner" |
| who provides the underlying machine resource, and a "configuration owner" who |
| maintains the build configuration. Generally, operational responsibility lies |
| with the "config owner". We do expect "resource owners" - who are generally |
| the contact listed in a workers attributes - to proxy requests to the relevant |
| "config owner" in a timely manner. |
| |
| Most issues with a buildbot should be addressed directly with a bot owner |
| via email. Please CC `Galina Kistanova <mailto:gkistanova@gmail.com>`_. |
| |
| Steps To Add Builder To LLVM Buildbot |
| ===================================== |
| Volunteers can provide their build machines to work as build workers to |
| public LLVM Buildbot. |
| |
| Here are the steps you can follow to do so: |
| |
| #. Check the existing build configurations to make sure the one you are |
| interested in is not covered yet or gets built on your computer much |
| faster than on the existing one. We prefer faster builds so developers |
| will get feedback sooner after changes get committed. |
| |
| #. The computer you will be registering with the LLVM buildbot |
| infrastructure should have all dependencies installed and be able to |
| build your configuration successfully. Please check what degree |
| of parallelism (-j param) would give the fastest build. You can build |
| multiple configurations on one computer. |
| |
| #. Install buildbot-worker (currently we are using buildbot version 2.8.4). |
| This specific version can be installed using ``pip``, with a command such |
| as ``pip3 install buildbot-worker==2.8.4``. |
| |
| #. Create a designated user account, your buildbot-worker will be running under, |
| and set appropriate permissions. |
| |
| #. Choose the buildbot-worker root directory (all builds will be placed under |
| it), buildbot-worker access name and password the build master will be using |
| to authenticate your buildbot-worker. |
| |
| #. Create a buildbot-worker in context of that buildbot-worker account. Point it |
| to the **lab.llvm.org** port **9994** (see `Buildbot documentation, |
| Creating a worker |
| <http://docs.buildbot.net/current/tutorial/firstrun.html#creating-a-worker>`_ |
| for more details) by running the following command: |
| |
| .. code-block:: bash |
| |
| $ buildbot-worker create-worker <buildbot-worker-root-directory> \ |
| lab.llvm.org:9994 \ |
| <buildbot-worker-access-name> \ |
| <buildbot-worker-access-password> |
| |
| Only once a new worker is stable, and |
| approval from Galina has been received (see last step) should it |
| be pointed at the main buildmaster. |
| |
| Now start the worker: |
| |
| .. code-block:: bash |
| |
| $ buildbot-worker start <buildbot-worker-root-directory> |
| |
| This will cause your new worker to connect to the staging buildmaster |
| which is silent by default. |
| |
| Try this once then check the log file |
| ``<buildbot-worker-root-directory>/worker/twistd.log``. If your settings |
| are correct you will see a refused connection. This is good and expected, |
| as the credentials have not been established on both ends. Now stop the |
| worker and proceed to the next steps. |
| |
| #. Fill the buildbot-worker description and admin name/e-mail. Here is an |
| example of the buildbot-worker description:: |
| |
| Windows 7 x64 |
| Core i7 (2.66GHz), 16GB of RAM |
| |
| g++.exe (TDM-1 mingw32) 4.4.0 |
| GNU Binutils 2.19.1 |
| cmake version 2.8.4 |
| Microsoft(R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86 |
| |
| See `here <http://docs.buildbot.net/current/manual/installation/worker.html>`_ |
| for which files to edit. |
| |
| #. Send a patch which adds your build worker and your builder to |
| `zorg <https://github.com/llvm/llvm-zorg>`_. Use the typical LLVM |
| `workflow <https://llvm.org/docs/Contributing.html#how-to-submit-a-patch>`_. |
| |
| * workers are added to ``buildbot/osuosl/master/config/workers.py`` |
| * builders are added to ``buildbot/osuosl/master/config/builders.py`` |
| |
| Please make sure your builder name and its builddir are unique through the |
| file. |
| |
| All new builders should default to using the "'collapseRequests': False" |
| configuration. This causes the builder to build each commit individually |
| and not merge build requests. To maximize quality of feedback to developers, |
| we *strongly prefer* builders to be configured not to collapse requests. |
| This flag should be removed only after all reasonable efforts have been |
| exhausted to improve build times such that the builder can keep up with |
| commit flow. |
| |
| It is possible to allow email addresses to unconditionally receive |
| notifications on build failure; for this you'll need to add an |
| ``InformativeMailNotifier`` to ``buildbot/osuosl/master/config/status.py``. |
| This is particularly useful for the staging buildmaster which is silent |
| otherwise. |
| |
| #. Send the buildbot-worker access name and the access password directly to |
| `Galina Kistanova <mailto:gkistanova@gmail.com>`_, and wait until she |
| lets you know that your changes are applied and buildmaster is |
| reconfigured. |
| |
| #. Make sure you can start the buildbot-worker and successfully connect |
| to the silent buildmaster. Then set up your buildbot-worker to start |
| automatically at the start up time. See the buildbot documentation |
| for help. You may want to restart your computer to see if it works. |
| |
| #. Check the status of your buildbot-worker on the `Waterfall Display (Staging) |
| <http://lab.llvm.org/staging/#/waterfall>`_ to make sure it is |
| connected, and the `Workers Display (Staging) |
| <http://lab.llvm.org/staging/#/workers>`_ to see if administrator |
| contact and worker information are correct. |
| |
| #. At this point, you have a working builder connected to the staging |
| buildmaster. You can now make sure it is reliably green and keeps |
| up with the build queue. No notifications will be sent, so you can |
| keep an unstable builder connected to staging indefinitely. |
| |
| #. (Optional) Once the builder is stable on the staging buildmaster with |
| several days of green history, you can choose to move it to the production |
| buildmaster to enable developer notifications. Please email `Galina |
| Kistanova <mailto:gkistanova@gmail.com>`_ for review and approval. |
| |
| To move a worker to production (once approved), stop your worker, edit the |
| buildbot.tac file to change the port number from 9994 to 9990 and start it |
| again. |
| |
| Testing a Builder Config Locally |
| ================================ |
| |
| It is possible to test a builder running against a local version of LLVM's |
| buildmaster setup. This allows you to test changes to builder, worker, and |
| buildmaster configuration. A buildmaster launched in this "local testing" mode |
| will: |
| |
| * Bind only to local interfaces. |
| * Use SQLite as the database. |
| * Use a single fixed password for workers. |
| * Disable extras like GitHub authentication. |
| |
| In order to use this "local testing" mode: |
| |
| * Create and activate a Python `venv |
| <https://docs.python.org/3/library/venv.html>`_ and install the necessary |
| dependencies. This step can be run from any directory. |
| |
| .. code-block:: bash |
| |
| python -m venv bbenv |
| source bbenv/bin/activate |
| pip install buildbot{,-console-view,-grid-view,-waterfall-view,-worker,-www}==3.11.7 urllib3 |
| |
| * If your system has Python 3.13 or newer you will need to additionally |
| install ``legacy-cgi`` and make a minor patch to the installed buildbot |
| package. This step does not need to be followed for earlier Python versions. |
| |
| .. code-block:: bash |
| |
| pip install legacy-cgi |
| sed -i \ |
| -e 's/import pipes/import shlex/' \ |
| -e 's/pipes\.quote/shlex.quote/' \ |
| bbenv/lib/python3.13/site-packages/buildbot_worker/runprocess.py |
| |
| * Initialise the necessary buildmaster files, link to the configuration in a |
| local checkout out of `llvm-zorg <https://github.com/llvm/llvm-zorg>`_, and |
| ask ``buildbot`` to check the configuration. This step can be run from any |
| directory. |
| |
| .. code-block:: bash |
| |
| buildbot create-master llvm-testbbmaster |
| cd llvm-testbbmaster |
| ln -s /path/to/checkout/of/llvm-zorg/buildbot/osuosl/master/master.cfg . |
| ln -s /path/to/checkout/of/llvm-zorg/buildbot/osuosl/master/config/ . |
| ln -s /path/to/checkout/of/llvm-zorg/zorg/ . |
| BUILDBOT_TEST=1 buildbot checkconfig |
| |
| * Start the buildmaster. |
| |
| .. code-block:: bash |
| |
| BUILDBOT_TEST=1 buildbot start --nodaemon . |
| |
| * After waiting a few seconds for startup to complete, you should be able to |
| open the web UI at ``http://localhost:8011``. If there are any errors or |
| this isn't working, check ``twistd.log`` (within the current directory) for |
| more information. |
| |
| * You can now create and start a buildbot worker. Ensure you pick the correct |
| name for the worker associated with the build configuration you want to test |
| in ``buildbot/osuosl/master/config/builders.py``. |
| |
| .. code-block:: bash |
| |
| buildbot-worker create-worker <buildbot-worker-root-directory> \ |
| localhost:9990 \ |
| <buildbot-worker-name> \ |
| test |
| buildbot-worker start --nodaemon <buildbot-worker-root-directory> |
| |
| * Either wait until the poller sets off a build, or alternatively force a |
| build to start in the web UI. |
| |
| * Review the progress and results of the build in the web UI. |
| |
| This local testing configuration defaults to binding only to the loopback |
| interface for security reasons. |
| |
| If you want to run the test worker on a different machine, or to run the |
| buildmaster on a remote server, ssh port forwarding can be used to make |
| connection possible. For instance, if running the buildmaster on a remote |
| server the following command will suffice to make the web UI accessible via |
| ``http://localhost:8011`` and make it possible for a local worker to connect |
| to the remote buildmaster by connecting to ``localhost:9900``: |
| |
| .. code-block:: bash |
| |
| ssh -N -L 8011:localhost:8011 -L 9990:localhost:9990 username@buildmaster_server_address |
| |
| Be aware that some build configurations may checkout the current upstream |
| ``llvm-zorg`` repository in order to retrieve additional scripts used during |
| the build process, meaning any local changes will not be reflected in this |
| part of the build. If you wish to test changes to any of these scripts without |
| committing them upstream, you will need to temporarily patch the builder logic |
| in order to instead check out your own branch. |
| Typically, ``addGetSourcecodeForProject`` from |
| ``zorg/buildbot/process/factory.py`` is used for this and you can edit the |
| caller to specify your own ``repourl`` and/or ``branch`` keyword argument. |
| |
| Best Practices for Configuring a Fast Builder |
| ============================================= |
| |
| As mentioned above, we generally have a strong preference for |
| builders which can build every commit as they come in. This section |
| includes best practices and some recommendations as to how to achieve |
| that end. |
| |
| The goal |
| In 2020, the monorepo had just under 35 thousand commits. This works |
| out to an average of 4 commits per hour. Already, we can see that a |
| builder must cycle in less than 15 minutes to have a hope of being |
| useful. However, those commits are not uniformly distributed. They |
| tend to cluster strongly during US working hours. Looking at a couple |
| of recent (Nov 2021) working days, we routinely see ~10 commits per |
| hour during peek times, with occasional spikes as high as ~15 commits |
| per hour. Thus, as a rule of thumb, we should plan for our builder to |
| complete ~10-15 builds an hour. |
| |
| Resource Appropriately |
| At 10-15 builds per hour, we need to complete a new build on average every |
| 4 to 6 minutes. For anything except the fastest of hardware/build configs, |
| this is going to be well beyond the ability of a single machine. In buildbot |
| terms, we likely going to need multiple workers to build requests in parallel |
| under a single builder configuration. For some rough back of the envelope |
| numbers, if your build config takes e.g. 30 minutes, you will need something |
| on the order of 5-8 workers. If your build config takes ~2 hours, you'll |
| need something on the order of 20-30 workers. The rest of this section |
| focuses on how to reduce cycle times. |
| |
| Restrict what you build and test |
| Think hard about why you're setting up a bot, and restrict your build |
| configuration as much as you can. Basic functionality is probably |
| already covered by other bots, and you don't need to duplicate that |
| testing. You only need to be building and testing the *unique* parts |
| of the configuration. (e.g. For a multi-stage clang builder, you probably |
| don't need to be enabling every target or building all the various utilities.) |
| |
| It can sometimes be worthwhile splitting a single builder into two or more, |
| if you have multiple distinct purposes for the same builder. As an example, |
| if you want to both a) confirm that all of LLVM builds with your host |
| compiler, and b) want to do a multi-stage clang build on your target, you |
| may be better off with two separate bots. Splitting increases resource |
| consumption, but makes it easy for each bot to keep up with commit flow. |
| Additionally, splitting bots may assist in triage by narrowing attention to |
| relevant parts of the failing configuration. |
| |
| In general, we recommend Release build types with Assertions enabled. This |
| generally provides a good balance between build times and bug detection for |
| most buildbots. There may be room for including some debug info (e.g. with |
| `-gmlt`), but in general the balance between debug info quality and build |
| times is a delicate one. |
| |
| Use Ninja & LLD |
| Ninja really does help build times over Make, particularly for highly |
| parallel builds. LLD helps to reduce both link times and memory usage |
| during linking significantly. With a build machine with sufficient |
| parallelism, link times tend to dominate critical path of the build, and are |
| thus worth optimizing. |
| |
| Use CCache and NOT incremental builds |
| Using ccache materially improves average build times. Incremental builds |
| can be slightly faster, but introduce the risk of build corruption due to |
| e.g. state changes, etc... At this point, the recommendation is not to |
| use incremental builds and instead use ccache as the latter captures the |
| majority of the benefit with less risk of false positives. |
| |
| One of the non-obvious benefits of using ccache is that it makes the |
| builder less sensitive to which projects are being monitored vs built. |
| If a change triggers a build request, but doesn't change the build output |
| (e.g. doc changes, python utility changes, etc..), the build will entirely |
| hit in cache and the build request will complete in just the testing time. |
| |
| With multiple workers, it is tempting to try to configure a shared cache |
| between the workers. Experience to date indicates this is difficult to |
| well, and that having local per-worker caches gets most of the benefit |
| anyways. We don't currently recommend shared caches. |
| |
| CCache does depend on the builder hardware having sufficient IO to access |
| the cache with reasonable access times - i.e. a fast disk, or enough memory |
| for a RAM cache, etc.. For builders without, incremental may be your best |
| option, but is likely to require higher ongoing involvement from the |
| sponsor. |
| |
| Enable batch builds |
| As a last resort, you can configure your builder to batch build requests. |
| This makes the build failure notifications markedly less actionable, and |
| should only be done once all other reasonable measures have been taken. |
| |
| Leave it on the staging buildmaster |
| While most of this section has been biased towards builders intended for |
| the main buildmaster, it is worth highlighting that builders can run |
| indefinitely on the staging buildmaster. Such a builder may still be |
| useful for the sponsoring organization, without concern of negatively |
| impacting the broader community. The sponsoring organization simply |
| has to take on the responsibility of all bisection and triage. |
| |
| Managing a Worker From The Web Interface |
| ======================================== |
| |
| Tasks such as clearing pending building requests can be done using |
| the Buildbot web interface. To do this you must be recognised as an admin |
| of the worker: |
| |
| * Set your public GitHub profile email to one that was included in the |
| ``admin`` information you set up on the worker. It does not matter if this |
| is your primary account email or a "verified email". To confirm this has been |
| done correctly, go to ``github.com/<your GitHub username>`` and you should |
| see the email address listed there. |
| |
| A worker can have many admins, if they are listed in the form |
| ``First Last <first.last@example.com>, First2 Last2 <first2.last2@example.com>``. |
| You only need to have one of those addresses in your profile to be recognised |
| as an admin. |
| |
| If you need to add an email address, you can edit the ``admin`` file and |
| restart the worker. You should see the new admin details in the web interface |
| shortly afterwards. |
| |
| * Connect GitHub to Buildbot by clicking on the "Anonymous" button on the |
| top right of the page, then "Login with GitHub" and authorise the app. |
| |
| Some tasks don't give immediate feedback, so if nothing happens within a short |
| time, try again with the browser's web console open. Sometimes you will see |
| 403 errors and other messages that might indicate you don't have the correct |
| details set up. |
| |