| ============================== |
| Moving LLVM Projects to GitHub |
| ============================== |
| |
| Current Status |
| ============== |
| |
| We are planning to complete the transition to GitHub by Oct 21, 2019. See |
| the GitHub migration `status page <https://llvm.org/GitHubMigrationStatus.html>`_ |
| for the latest updates and instructions for how to migrate your workflows. |
| |
| .. contents:: Table of Contents |
| :depth: 4 |
| :local: |
| |
| Introduction |
| ============ |
| |
| This is a proposal to move our current revision control system from our own |
| hosted Subversion to GitHub. Below are the financial and technical arguments as |
| to why we are proposing such a move and how people (and validation |
| infrastructure) will continue to work with a Git-based LLVM. |
| |
| What This Proposal is *Not* About |
| ================================= |
| |
| Changing the development policy. |
| |
| This proposal relates only to moving the hosting of our source-code repository |
| from SVN hosted on our own servers to Git hosted on GitHub. We are not proposing |
| using GitHub's issue tracker, pull-requests, or code-review. |
| |
| Contributors will continue to earn commit access on demand under the Developer |
| Policy, except that that a GitHub account will be required instead of SVN |
| username/password-hash. |
| |
| Why Git, and Why GitHub? |
| ======================== |
| |
| Why Move At All? |
| ---------------- |
| |
| This discussion began because we currently host our own Subversion server |
| and Git mirror on a voluntary basis. The LLVM Foundation sponsors the server and |
| provides limited support, but there is only so much it can do. |
| |
| Volunteers are not sysadmins themselves, but compiler engineers that happen |
| to know a thing or two about hosting servers. We also don't have 24/7 support, |
| and we sometimes wake up to see that continuous integration is broken because |
| the SVN server is either down or unresponsive. |
| |
| We should take advantage of one of the services out there (GitHub, GitLab, |
| and BitBucket, among others) that offer better service (24/7 stability, disk |
| space, Git server, code browsing, forking facilities, etc) for free. |
| |
| Why Git? |
| -------- |
| |
| Many new coders nowadays start with Git, and a lot of people have never used |
| SVN, CVS, or anything else. Websites like GitHub have changed the landscape |
| of open source contributions, reducing the cost of first contribution and |
| fostering collaboration. |
| |
| Git is also the version control many LLVM developers use. Despite the |
| sources being stored in a SVN server, these developers are already using Git |
| through the Git-SVN integration. |
| |
| Git allows you to: |
| |
| * Commit, squash, merge, and fork locally without touching the remote server. |
| * Maintain local branches, enabling multiple threads of development. |
| * Collaborate on these branches (e.g. through your own fork of llvm on GitHub). |
| * Inspect the repository history (blame, log, bisect) without Internet access. |
| * Maintain remote forks and branches on Git hosting services and |
| integrate back to the main repository. |
| |
| In addition, because Git seems to be replacing many OSS projects' version |
| control systems, there are many tools that are built over Git. |
| Future tooling may support Git first (if not only). |
| |
| Why GitHub? |
| ----------- |
| |
| GitHub, like GitLab and BitBucket, provides free code hosting for open source |
| projects. Any of these could replace the code-hosting infrastructure that we |
| have today. |
| |
| These services also have a dedicated team to monitor, migrate, improve and |
| distribute the contents of the repositories depending on region and load. |
| |
| GitHub has one important advantage over GitLab and |
| BitBucket: it offers read-write **SVN** access to the repository |
| (https://github.com/blog/626-announcing-svn-support). |
| This would enable people to continue working post-migration as though our code |
| were still canonically in an SVN repository. |
| |
| In addition, there are already multiple LLVM mirrors on GitHub, indicating that |
| part of our community has already settled there. |
| |
| On Managing Revision Numbers with Git |
| ------------------------------------- |
| |
| The current SVN repository hosts all the LLVM sub-projects alongside each other. |
| A single revision number (e.g. r123456) thus identifies a consistent version of |
| all LLVM sub-projects. |
| |
| Git does not use sequential integer revision number but instead uses a hash to |
| identify each commit. |
| |
| The loss of a sequential integer revision number has been a sticking point in |
| past discussions about Git: |
| |
| - "The 'branch' I most care about is mainline, and losing the ability to say |
| 'fixed in r1234' (with some sort of monotonically increasing number) would |
| be a tragic loss." [LattnerRevNum]_ |
| - "I like those results sorted by time and the chronology should be obvious, but |
| timestamps are incredibly cumbersome and make it difficult to verify that a |
| given checkout matches a given set of results." [TrickRevNum]_ |
| - "There is still the major regression with unreadable version numbers. |
| Given the amount of Bugzilla traffic with 'Fixed in...', that's a |
| non-trivial issue." [JSonnRevNum]_ |
| - "Sequential IDs are important for LNT and llvmlab bisection tool." [MatthewsRevNum]_. |
| |
| However, Git can emulate this increasing revision number: |
| ``git rev-list --count <commit-hash>``. This identifier is unique only |
| within a single branch, but this means the tuple `(num, branch-name)` uniquely |
| identifies a commit. |
| |
| We can thus use this revision number to ensure that e.g. `clang -v` reports a |
| user-friendly revision number (e.g. `master-12345` or `4.0-5321`), addressing |
| the objections raised above with respect to this aspect of Git. |
| |
| What About Branches and Merges? |
| ------------------------------- |
| |
| In contrast to SVN, Git makes branching easy. Git's commit history is |
| represented as a DAG, a departure from SVN's linear history. However, we propose |
| to mandate making merge commits illegal in our canonical Git repository. |
| |
| Unfortunately, GitHub does not support server side hooks to enforce such a |
| policy. We must rely on the community to avoid pushing merge commits. |
| |
| GitHub offers a feature called `Status Checks`: a branch protected by |
| `status checks` requires commits to be explicitly allowed before the push can happen. |
| We could supply a pre-push hook on the client side that would run and check the |
| history, before allowing the commit being pushed [statuschecks]_. |
| However this solution would be somewhat fragile (how do you update a script |
| installed on every developer machine?) and prevents SVN access to the |
| repository. |
| |
| What About Commit Emails? |
| ------------------------- |
| |
| We will need a new bot to send emails for each commit. This proposal leaves the |
| email format unchanged besides the commit URL. |
| |
| Straw Man Migration Plan |
| ======================== |
| |
| Step #1 : Before The Move |
| ------------------------- |
| |
| 1. Update docs to mention the move, so people are aware of what is going on. |
| 2. Set up a read-only version of the GitHub project, mirroring our current SVN |
| repository. |
| 3. Add the required bots to implement the commit emails, as well as the |
| umbrella repository update (if the multirepo is selected) or the read-only |
| Git views for the sub-projects (if the monorepo is selected). |
| |
| Step #2 : Git Move |
| ------------------ |
| |
| 4. Update the buildbots to pick up updates and commits from the GitHub |
| repository. Not all bots have to migrate at this point, but it'll help |
| provide infrastructure testing. |
| 5. Update Phabricator to pick up commits from the GitHub repository. |
| 6. LNT and llvmlab have to be updated: they rely on unique monotonically |
| increasing integer across branch [MatthewsRevNum]_. |
| 7. Instruct downstream integrators to pick up commits from the GitHub |
| repository. |
| 8. Review and prepare an update for the LLVM documentation. |
| |
| Until this point nothing has changed for developers, it will just |
| boil down to a lot of work for buildbot and other infrastructure |
| owners. |
| |
| The migration will pause here until all dependencies have cleared, and all |
| problems have been solved. |
| |
| Step #3: Write Access Move |
| -------------------------- |
| |
| 9. Collect developers' GitHub account information, and add them to the project. |
| 10. Switch the SVN repository to read-only and allow pushes to the GitHub repository. |
| 11. Update the documentation. |
| 12. Mirror Git to SVN. |
| |
| Step #4 : Post Move |
| ------------------- |
| |
| 13. Archive the SVN repository. |
| 14. Update links on the LLVM website pointing to viewvc/klaus/phab etc. to |
| point to GitHub instead. |
| |
| GitHub Repository Description |
| ============================= |
| |
| Monorepo |
| ---------------- |
| |
| The LLVM git repository hosted at https://github.com/llvm/llvm-project contains all |
| sub-projects in a single source tree. It is often referred to as a monorepo and |
| mimics an export of the current SVN repository, with each sub-project having its |
| own top-level directory. Not all sub-projects are used for building toolchains. |
| For example, www/ and test-suite/ are not part of the monorepo. |
| |
| Putting all sub-projects in a single checkout makes cross-project refactoring |
| naturally simple: |
| |
| * New sub-projects can be trivially split out for better reuse and/or layering |
| (e.g., to allow libSupport and/or LIT to be used by runtimes without adding a |
| dependency on LLVM). |
| * Changing an API in LLVM and upgrading the sub-projects will always be done in |
| a single commit, designing away a common source of temporary build breakage. |
| * Moving code across sub-project (during refactoring for instance) in a single |
| commit enables accurate `git blame` when tracking code change history. |
| * Tooling based on `git grep` works natively across sub-projects, allowing to |
| easier find refactoring opportunities across projects (for example reusing a |
| datastructure initially in LLDB by moving it into libSupport). |
| * Having all the sources present encourages maintaining the other sub-projects |
| when changing API. |
| |
| Finally, the monorepo maintains the property of the existing SVN repository that |
| the sub-projects move synchronously, and a single revision number (or commit |
| hash) identifies the state of the development across all projects. |
| |
| .. _build_single_project: |
| |
| Building a single sub-project |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| Even though there is a single source tree, you are not required to build |
| all sub-projects together. It is trivial to configure builds for a single |
| sub-project. |
| |
| For example:: |
| |
| mkdir build && cd build |
| # Configure only LLVM (default) |
| cmake path/to/monorepo |
| # Configure LLVM and lld |
| cmake path/to/monorepo -DLLVM_ENABLE_PROJECTS=lld |
| # Configure LLVM and clang |
| cmake path/to/monorepo -DLLVM_ENABLE_PROJECTS=clang |
| |
| .. _git-svn-mirror: |
| |
| Outstanding Questions |
| --------------------- |
| |
| Read-only sub-project mirrors |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| With the Monorepo, it is undecided whether the existing single-subproject |
| mirrors (e.g. https://git.llvm.org/git/compiler-rt.git) will continue to |
| be maintained. |
| |
| Read/write SVN bridge |
| ^^^^^^^^^^^^^^^^^^^^^ |
| |
| GitHub supports a read/write SVN bridge for its repositories. However, |
| there have been issues with this bridge working correctly in the past, |
| so it's not clear if this is something that will be supported going forward. |
| |
| Monorepo Drawbacks |
| ------------------ |
| |
| * Using the monolithic repository may add overhead for those contributing to a |
| standalone sub-project, particularly on runtimes like libcxx and compiler-rt |
| that don't rely on LLVM; currently, a fresh clone of libcxx is only 15MB (vs. |
| 1GB for the monorepo), and the commit rate of LLVM may cause more frequent |
| `git push` collisions when upstreaming. Affected contributors may be able to |
| use the SVN bridge or the single-subproject Git mirrors. However, it's |
| undecided if these projects will continue to be maintained. |
| * Using the monolithic repository may add overhead for those *integrating* a |
| standalone sub-project, even if they aren't contributing to it, due to the |
| same disk space concern as the point above. The availability of the |
| sub-project Git mirrors would addresses this. |
| * Preservation of the existing read/write SVN-based workflows relies on the |
| GitHub SVN bridge, which is an extra dependency. Maintaining this locks us |
| into GitHub and could restrict future workflow changes. |
| |
| Workflows |
| ^^^^^^^^^ |
| |
| * :ref:`Checkout/Clone a Single Project, without Commit Access <workflow-checkout-commit>`. |
| * :ref:`Checkout/Clone Multiple Projects, with Commit Access <workflow-monocheckout-multicommit>`. |
| * :ref:`Commit an API Change in LLVM and Update the Sub-projects <workflow-cross-repo-commit>`. |
| * :ref:`Branching/Stashing/Updating for Local Development or Experiments <workflow-mono-branching>`. |
| * :ref:`Bisecting <workflow-mono-bisecting>`. |
| |
| Workflow Before/After |
| ===================== |
| |
| This section goes through a few examples of workflows, intended to illustrate |
| how end-users or developers would interact with the repository for |
| various use-cases. |
| |
| .. _workflow-checkout-commit: |
| |
| Checkout/Clone a Single Project, with Commit Access |
| --------------------------------------------------- |
| |
| Currently |
| ^^^^^^^^^ |
| |
| :: |
| |
| # direct SVN checkout |
| svn co https://user@llvm.org/svn/llvm-project/llvm/trunk llvm |
| # or using the read-only Git view, with git-svn |
| git clone https://llvm.org/git/llvm.git |
| cd llvm |
| git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username=<username> |
| git config svn-remote.svn.fetch :refs/remotes/origin/master |
| git svn rebase -l # -l avoids fetching ahead of the git mirror. |
| |
| Commits are performed using `svn commit` or with the sequence `git commit` and |
| `git svn dcommit`. |
| |
| .. _workflow-multicheckout-nocommit: |
| |
| Monorepo Variant |
| ^^^^^^^^^^^^^^^^ |
| |
| With the monorepo variant, there are a few options, depending on your |
| constraints. First, you could just clone the full repository: |
| |
| git clone https://github.com/llvm/llvm-project.git |
| |
| At this point you have every sub-project (llvm, clang, lld, lldb, ...), which |
| :ref:`doesn't imply you have to build all of them <build_single_project>`. You |
| can still build only compiler-rt for instance. In this way it's not different |
| from someone who would check out all the projects with SVN today. |
| |
| If you want to avoid checking out all the sources, you can hide the other |
| directories using a Git sparse checkout:: |
| |
| git config core.sparseCheckout true |
| echo /compiler-rt > .git/info/sparse-checkout |
| git read-tree -mu HEAD |
| |
| The data for all sub-projects is still in your `.git` directory, but in your |
| checkout, you only see `compiler-rt`. |
| Before you push, you'll need to fetch and rebase (`git pull --rebase`) as |
| usual. |
| |
| Note that when you fetch you'll likely pull in changes to sub-projects you don't |
| care about. If you are using sparse checkout, the files from other projects |
| won't appear on your disk. The only effect is that your commit hash changes. |
| |
| You can check whether the changes in the last fetch are relevant to your commit |
| by running:: |
| |
| git log origin/master@{1}..origin/master -- libcxx |
| |
| This command can be hidden in a script so that `git llvmpush` would perform all |
| these steps, fail only if such a dependent change exists, and show immediately |
| the change that prevented the push. An immediate repeat of the command would |
| (almost) certainly result in a successful push. |
| Note that today with SVN or git-svn, this step is not possible since the |
| "rebase" implicitly happens while committing (unless a conflict occurs). |
| |
| Checkout/Clone Multiple Projects, with Commit Access |
| ---------------------------------------------------- |
| |
| Let's look how to assemble llvm+clang+libcxx at a given revision. |
| |
| Currently |
| ^^^^^^^^^ |
| |
| :: |
| |
| svn co https://llvm.org/svn/llvm-project/llvm/trunk llvm -r $REVISION |
| cd llvm/tools |
| svn co https://llvm.org/svn/llvm-project/clang/trunk clang -r $REVISION |
| cd ../projects |
| svn co https://llvm.org/svn/llvm-project/libcxx/trunk libcxx -r $REVISION |
| |
| Or using git-svn:: |
| |
| git clone https://llvm.org/git/llvm.git |
| cd llvm/ |
| git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username=<username> |
| git config svn-remote.svn.fetch :refs/remotes/origin/master |
| git svn rebase -l |
| git checkout `git svn find-rev -B r258109` |
| cd tools |
| git clone https://llvm.org/git/clang.git |
| cd clang/ |
| git svn init https://llvm.org/svn/llvm-project/clang/trunk --username=<username> |
| git config svn-remote.svn.fetch :refs/remotes/origin/master |
| git svn rebase -l |
| git checkout `git svn find-rev -B r258109` |
| cd ../../projects/ |
| git clone https://llvm.org/git/libcxx.git |
| cd libcxx |
| git svn init https://llvm.org/svn/llvm-project/libcxx/trunk --username=<username> |
| git config svn-remote.svn.fetch :refs/remotes/origin/master |
| git svn rebase -l |
| git checkout `git svn find-rev -B r258109` |
| |
| Note that the list would be longer with more sub-projects. |
| |
| .. _workflow-monocheckout-multicommit: |
| |
| Monorepo Variant |
| ^^^^^^^^^^^^^^^^ |
| |
| The repository contains natively the source for every sub-projects at the right |
| revision, which makes this straightforward:: |
| |
| git clone https://github.com/llvm/llvm-project.git |
| cd llvm-projects |
| git checkout $REVISION |
| |
| As before, at this point clang, llvm, and libcxx are stored in directories |
| alongside each other. |
| |
| .. _workflow-cross-repo-commit: |
| |
| Commit an API Change in LLVM and Update the Sub-projects |
| -------------------------------------------------------- |
| |
| Today this is possible, even though not common (at least not documented) for |
| subversion users and for git-svn users. For example, few Git users try to update |
| LLD or Clang in the same commit as they change an LLVM API. |
| |
| The multirepo variant does not address this: one would have to commit and push |
| separately in every individual repository. It would be possible to establish a |
| protocol whereby users add a special token to their commit messages that causes |
| the umbrella repo's updater bot to group all of them into a single revision. |
| |
| The monorepo variant handles this natively. |
| |
| Branching/Stashing/Updating for Local Development or Experiments |
| ---------------------------------------------------------------- |
| |
| Currently |
| ^^^^^^^^^ |
| |
| SVN does not allow this use case, but developers that are currently using |
| git-svn can do it. Let's look in practice what it means when dealing with |
| multiple sub-projects. |
| |
| To update the repository to tip of trunk:: |
| |
| git pull |
| cd tools/clang |
| git pull |
| cd ../../projects/libcxx |
| git pull |
| |
| To create a new branch:: |
| |
| git checkout -b MyBranch |
| cd tools/clang |
| git checkout -b MyBranch |
| cd ../../projects/libcxx |
| git checkout -b MyBranch |
| |
| To switch branches:: |
| |
| git checkout AnotherBranch |
| cd tools/clang |
| git checkout AnotherBranch |
| cd ../../projects/libcxx |
| git checkout AnotherBranch |
| |
| .. _workflow-mono-branching: |
| |
| Monorepo Variant |
| ^^^^^^^^^^^^^^^^ |
| |
| Regular Git commands are sufficient, because everything is in a single |
| repository: |
| |
| To update the repository to tip of trunk:: |
| |
| git pull |
| |
| To create a new branch:: |
| |
| git checkout -b MyBranch |
| |
| To switch branches:: |
| |
| git checkout AnotherBranch |
| |
| Bisecting |
| --------- |
| |
| Assuming a developer is looking for a bug in clang (or lld, or lldb, ...). |
| |
| Currently |
| ^^^^^^^^^ |
| |
| SVN does not have builtin bisection support, but the single revision across |
| sub-projects makes it possible to script around. |
| |
| Using the existing Git read-only view of the repositories, it is possible to use |
| the native Git bisection script over the llvm repository, and use some scripting |
| to synchronize the clang repository to match the llvm revision. |
| |
| .. _workflow-mono-bisecting: |
| |
| Monorepo Variant |
| ^^^^^^^^^^^^^^^^ |
| |
| Bisecting on the monorepo is straightforward, and very similar to the above, |
| except that the bisection script does not need to include the |
| `git submodule update` step. |
| |
| The same example, finding which commit introduces a regression where clang-3.9 |
| crashes but not clang-3.8 passes, will look like:: |
| |
| git bisect start releases/3.9.x releases/3.8.x |
| git bisect run ./bisect_script.sh |
| |
| With the `bisect_script.sh` script being:: |
| |
| #!/bin/sh |
| cd $BUILD_DIR |
| |
| ninja clang || exit 125 # an exit code of 125 asks "git bisect" |
| # to "skip" the current commit |
| |
| ./bin/clang some_crash_test.cpp |
| |
| Also, since the monorepo handles commits update across multiple projects, you're |
| less like to encounter a build failure where a commit change an API in LLVM and |
| another later one "fixes" the build in clang. |
| |
| Moving Local Branches to the Monorepo |
| ===================================== |
| |
| Suppose you have been developing against the existing LLVM git |
| mirrors. You have one or more git branches that you want to migrate |
| to the "final monorepo". |
| |
| The simplest way to migrate such branches is with the |
| ``migrate-downstream-fork.py`` tool at |
| https://github.com/jyknight/llvm-git-migration. |
| |
| Basic migration |
| --------------- |
| |
| Basic instructions for ``migrate-downstream-fork.py`` are in the |
| Python script and are expanded on below to a more general recipe:: |
| |
| # Make a repository which will become your final local mirror of the |
| # monorepo. |
| mkdir my-monorepo |
| git -C my-monorepo init |
| |
| # Add a remote to the monorepo. |
| git -C my-monorepo remote add upstream/monorepo https://github.com/llvm/llvm-project.git |
| |
| # Add remotes for each git mirror you use, from upstream as well as |
| # your local mirror. All projects are listed here but you need only |
| # import those for which you have local branches. |
| my_projects=( clang |
| clang-tools-extra |
| compiler-rt |
| debuginfo-tests |
| libcxx |
| libcxxabi |
| libunwind |
| lld |
| lldb |
| llvm |
| openmp |
| polly ) |
| for p in ${my_projects[@]}; do |
| git -C my-monorepo remote add upstream/split/${p} https://github.com/llvm-mirror/${p}.git |
| git -C my-monorepo remote add local/split/${p} https://my.local.mirror.org/${p}.git |
| done |
| |
| # Pull in all the commits. |
| git -C my-monorepo fetch --all |
| |
| # Run migrate-downstream-fork to rewrite local branches on top of |
| # the upstream monorepo. |
| ( |
| cd my-monorepo |
| migrate-downstream-fork.py \ |
| refs/remotes/local \ |
| refs/tags \ |
| --new-repo-prefix=refs/remotes/upstream/monorepo \ |
| --old-repo-prefix=refs/remotes/upstream/split \ |
| --source-kind=split \ |
| --revmap-out=monorepo-map.txt |
| ) |
| |
| # Octopus-merge the resulting local split histories to unify them. |
| |
| # Assumes local work on local split mirrors is on master (and |
| # upstream is presumably represented by some other branch like |
| # upstream/master). |
| my_local_branch="master" |
| |
| git -C my-monorepo branch --no-track local/octopus/master \ |
| $(git -C my-monorepo merge-base refs/remotes/upstream/monorepo/master \ |
| refs/remotes/local/split/llvm/${my_local_branch}) |
| git -C my-monorepo checkout local/octopus/${my_local_branch} |
| |
| subproject_branches=() |
| for p in ${my_projects[@]}; do |
| subproject_branch=${p}/local/monorepo/${my_local_branch} |
| git -C my-monorepo branch ${subproject_branch} \ |
| refs/remotes/local/split/${p}/${my_local_branch} |
| if [[ "${p}" != "llvm" ]]; then |
| subproject_branches+=( ${subproject_branch} ) |
| fi |
| done |
| |
| git -C my-monorepo merge ${subproject_branches[@]} |
| |
| for p in ${my_projects[@]}; do |
| subproject_branch=${p}/local/monorepo/${my_local_branch} |
| git -C my-monorepo branch -d ${subproject_branch} |
| done |
| |
| # Create local branches for upstream monorepo branches. |
| for ref in $(git -C my-monorepo for-each-ref --format="%(refname)" \ |
| refs/remotes/upstream/monorepo); do |
| upstream_branch=${ref#refs/remotes/upstream/monorepo/} |
| git -C my-monorepo branch upstream/${upstream_branch} ${ref} |
| done |
| |
| The above gets you to a state like the following:: |
| |
| U1 - U2 - U3 <- upstream/master |
| \ \ \ |
| \ \ - Llld1 - Llld2 - |
| \ \ \ |
| \ - Lclang1 - Lclang2-- Lmerge <- local/octopus/master |
| \ / |
| - Lllvm1 - Lllvm2----- |
| |
| Each branched component has its branch rewritten on top of the |
| monorepo and all components are unified by a giant octopus merge. |
| |
| If additional active local branches need to be preserved, the above |
| operations following the assignment to ``my_local_branch`` should be |
| done for each branch. Ref paths will need to be updated to map the |
| local branch to the corresponding upstream branch. If local branches |
| have no corresponding upstream branch, then the creation of |
| ``local/octopus/<local branch>`` need not use ``git-merge-base`` to |
| pinpoint its root commit; it may simply be branched from the |
| appropriate component branch (say, ``llvm/local_release_X``). |
| |
| Zipping local history |
| --------------------- |
| |
| The octopus merge is suboptimal for many cases, because walking back |
| through the history of one component leaves the other components fixed |
| at a history that likely makes things unbuildable. |
| |
| Some downstream users track the order commits were made to subprojects |
| with some kind of "umbrella" project that imports the project git |
| mirrors as submodules, similar to the multirepo umbrella proposed |
| above. Such an umbrella repository looks something like this:: |
| |
| UM1 ---- UM2 -- UM3 -- UM4 ---- UM5 ---- UM6 ---- UM7 ---- UM8 <- master |
| | | | | | | | |
| Lllvm1 Llld1 Lclang1 Lclang2 Lllvm2 Llld2 Lmyproj1 |
| |
| The vertical bars represent submodule updates to a particular local |
| commit in the project mirror. ``UM3`` in this case is a commit of |
| some local umbrella repository state that is not a submodule update, |
| perhaps a ``README`` or project build script update. Commit ``UM8`` |
| updates a submodule of local project ``myproj``. |
| |
| The tool ``zip-downstream-fork.py`` at |
| https://github.com/greened/llvm-git-migration/tree/zip can be used to |
| convert the umbrella history into a monorepo-based history with |
| commits in the order implied by submodule updates:: |
| |
| U1 - U2 - U3 <- upstream/master |
| \ \ \ |
| \ -----\--------------- local/zip--. |
| \ \ \ | |
| - Lllvm1 - Llld1 - UM3 - Lclang1 - Lclang2 - Lllvm2 - Llld2 - Lmyproj1 <-' |
| |
| |
| The ``U*`` commits represent upstream commits to the monorepo master |
| branch. Each submodule update in the local ``UM*`` commits brought in |
| a subproject tree at some local commit. The trees in the ``L*1`` |
| commits represent merges from upstream. These result in edges from |
| the ``U*`` commits to their corresponding rewritten ``L*1`` commits. |
| The ``L*2`` commits did not do any merges from upstream. |
| |
| Note that the merge from ``U2`` to ``Lclang1`` appears redundant, but |
| if, say, ``U3`` changed some files in upstream clang, the ``Lclang1`` |
| commit appearing after the ``Llld1`` commit would actually represent a |
| clang tree *earlier* in the upstream clang history. We want the |
| ``local/zip`` branch to accurately represent the state of our umbrella |
| history and so the edge ``U2 -> Lclang1`` is a visual reminder of what |
| clang's tree actually looks like in ``Lclang1``. |
| |
| Even so, the edge ``U3 -> Llld1`` could be problematic for future |
| merges from upstream. git will think that we've already merged from |
| ``U3``, and we have, except for the state of the clang tree. One |
| possible mitigation strategy is to manually diff clang between ``U2`` |
| and ``U3`` and apply those updates to ``local/zip``. Another, |
| possibly simpler strategy is to freeze local work on downstream |
| branches and merge all submodules from the latest upstream before |
| running ``zip-downstream-fork.py``. If downstream merged each project |
| from upstream in lockstep without any intervening local commits, then |
| things should be fine without any special action. We anticipate this |
| to be the common case. |
| |
| The tree for ``Lclang1`` outside of clang will represent the state of |
| things at ``U3`` since all of the upstream projects not participating |
| in the umbrella history should be in a state respecting the commit |
| ``U3``. The trees for llvm and lld should correctly represent commits |
| ``Lllvm1`` and ``Llld1``, respectively. |
| |
| Commit ``UM3`` changed files not related to submodules and we need |
| somewhere to put them. It is not safe in general to put them in the |
| monorepo root directory because they may conflict with files in the |
| monorepo. Let's assume we want them in a directory ``local`` in the |
| monorepo. |
| |
| **Example 1: Umbrella looks like the monorepo** |
| |
| For this example, we'll assume that each subproject appears in its own |
| top-level directory in the umbrella, just as they do in the monorepo . |
| Let's also assume that we want the files in directory ``myproj`` to |
| appear in ``local/myproj``. |
| |
| Given the above run of ``migrate-downstream-fork.py``, a recipe to |
| create the zipped history is below:: |
| |
| # Import any non-LLVM repositories the umbrella references. |
| git -C my-monorepo remote add localrepo \ |
| https://my.local.mirror.org/localrepo.git |
| git fetch localrepo |
| |
| subprojects=( clang clang-tools-extra compiler-rt debuginfo-tests libclc |
| libcxx libcxxabi libunwind lld lldb llgo llvm openmp |
| parallel-libs polly pstl ) |
| |
| # Import histories for upstream split projects (this was probably |
| # already done for the ``migrate-downstream-fork.py`` run). |
| for project in ${subprojects[@]}; do |
| git remote add upstream/split/${project} \ |
| https://github.com/llvm-mirror/${subproject}.git |
| git fetch umbrella/split/${project} |
| done |
| |
| # Import histories for downstream split projects (this was probably |
| # already done for the ``migrate-downstream-fork.py`` run). |
| for project in ${subprojects[@]}; do |
| git remote add local/split/${project} \ |
| https://my.local.mirror.org/${subproject}.git |
| git fetch local/split/${project} |
| done |
| |
| # Import umbrella history. |
| git -C my-monorepo remote add umbrella \ |
| https://my.local.mirror.org/umbrella.git |
| git fetch umbrella |
| |
| # Put myproj in local/myproj |
| echo "myproj local/myproj" > my-monorepo/submodule-map.txt |
| |
| # Rewrite history |
| ( |
| cd my-monorepo |
| zip-downstream-fork.py \ |
| refs/remotes/umbrella \ |
| --new-repo-prefix=refs/remotes/upstream/monorepo \ |
| --old-repo-prefix=refs/remotes/upstream/split \ |
| --revmap-in=monorepo-map.txt \ |
| --revmap-out=zip-map.txt \ |
| --subdir=local \ |
| --submodule-map=submodule-map.txt \ |
| --update-tags |
| ) |
| |
| # Create the zip branch (assuming umbrella master is wanted). |
| git -C my-monorepo branch --no-track local/zip/master refs/remotes/umbrella/master |
| |
| Note that if the umbrella has submodules to non-LLVM repositories, |
| ``zip-downstream-fork.py`` needs to know about them to be able to |
| rewrite commits. That is why the first step above is to fetch commits |
| from such repositories. |
| |
| With ``--update-tags`` the tool will migrate annotated tags pointing |
| to submodule commits that were inlined into the zipped history. If |
| the umbrella pulled in an upstream commit that happened to have a tag |
| pointing to it, that tag will be migrated, which is almost certainly |
| not what is wanted. The tag can always be moved back to its original |
| commit after rewriting, or the ``--update-tags`` option may be |
| discarded and any local tags would then be migrated manually. |
| |
| **Example 2: Nested sources layout** |
| |
| The tool handles nested submodules (e.g. llvm is a submodule in |
| umbrella and clang is a submodule in llvm). The file |
| ``submodule-map.txt`` is a list of pairs, one per line. The first |
| pair item describes the path to a submodule in the umbrella |
| repository. The second pair item describes the path where trees for |
| that submodule should be written in the zipped history. |
| |
| Let's say your umbrella repository is actually the llvm repository and |
| it has submodules in the "nested sources" layout (clang in |
| tools/clang, etc.). Let's also say ``projects/myproj`` is a submodule |
| pointing to some downstream repository. The submodule map file should |
| look like this (we still want myproj mapped the same way as |
| previously):: |
| |
| tools/clang clang |
| tools/clang/tools/extra clang-tools-extra |
| projects/compiler-rt compiler-rt |
| projects/debuginfo-tests debuginfo-tests |
| projects/libclc libclc |
| projects/libcxx libcxx |
| projects/libcxxabi libcxxabi |
| projects/libunwind libunwind |
| tools/lld lld |
| tools/lldb lldb |
| projects/openmp openmp |
| tools/polly polly |
| projects/myproj local/myproj |
| |
| If a submodule path does not appear in the map, the tools assumes it |
| should be placed in the same place in the monorepo. That means if you |
| use the "nested sources" layout in your umrella, you *must* provide |
| map entries for all of the projects in your umbrella (except llvm). |
| Otherwise trees from submodule updates will appear underneath llvm in |
| the zippped history. |
| |
| Because llvm is itself the umbrella, we use --subdir to write its |
| content into ``llvm`` in the zippped history:: |
| |
| # Import any non-LLVM repositories the umbrella references. |
| git -C my-monorepo remote add localrepo \ |
| https://my.local.mirror.org/localrepo.git |
| git fetch localrepo |
| |
| subprojects=( clang clang-tools-extra compiler-rt debuginfo-tests libclc |
| libcxx libcxxabi libunwind lld lldb llgo llvm openmp |
| parallel-libs polly pstl ) |
| |
| # Import histories for upstream split projects (this was probably |
| # already done for the ``migrate-downstream-fork.py`` run). |
| for project in ${subprojects[@]}; do |
| git remote add upstream/split/${project} \ |
| https://github.com/llvm-mirror/${subproject}.git |
| git fetch umbrella/split/${project} |
| done |
| |
| # Import histories for downstream split projects (this was probably |
| # already done for the ``migrate-downstream-fork.py`` run). |
| for project in ${subprojects[@]}; do |
| git remote add local/split/${project} \ |
| https://my.local.mirror.org/${subproject}.git |
| git fetch local/split/${project} |
| done |
| |
| # Import umbrella history. We want this under a different refspec |
| # so zip-downstream-fork.py knows what it is. |
| git -C my-monorepo remote add umbrella \ |
| https://my.local.mirror.org/llvm.git |
| git fetch umbrella |
| |
| # Create the submodule map. |
| echo "tools/clang clang" > my-monorepo/submodule-map.txt |
| echo "tools/clang/tools/extra clang-tools-extra" >> my-monorepo/submodule-map.txt |
| echo "projects/compiler-rt compiler-rt" >> my-monorepo/submodule-map.txt |
| echo "projects/debuginfo-tests debuginfo-tests" >> my-monorepo/submodule-map.txt |
| echo "projects/libclc libclc" >> my-monorepo/submodule-map.txt |
| echo "projects/libcxx libcxx" >> my-monorepo/submodule-map.txt |
| echo "projects/libcxxabi libcxxabi" >> my-monorepo/submodule-map.txt |
| echo "projects/libunwind libunwind" >> my-monorepo/submodule-map.txt |
| echo "tools/lld lld" >> my-monorepo/submodule-map.txt |
| echo "tools/lldb lldb" >> my-monorepo/submodule-map.txt |
| echo "projects/openmp openmp" >> my-monorepo/submodule-map.txt |
| echo "tools/polly polly" >> my-monorepo/submodule-map.txt |
| echo "projects/myproj local/myproj" >> my-monorepo/submodule-map.txt |
| |
| # Rewrite history |
| ( |
| cd my-monorepo |
| zip-downstream-fork.py \ |
| refs/remotes/umbrella \ |
| --new-repo-prefix=refs/remotes/upstream/monorepo \ |
| --old-repo-prefix=refs/remotes/upstream/split \ |
| --revmap-in=monorepo-map.txt \ |
| --revmap-out=zip-map.txt \ |
| --subdir=llvm \ |
| --submodule-map=submodule-map.txt \ |
| --update-tags |
| ) |
| |
| # Create the zip branch (assuming umbrella master is wanted). |
| git -C my-monorepo branch --no-track local/zip/master refs/remotes/umbrella/master |
| |
| |
| Comments at the top of ``zip-downstream-fork.py`` describe in more |
| detail how the tool works and various implications of its operation. |
| |
| Importing local repositories |
| ---------------------------- |
| |
| You may have additional repositories that integrate with the LLVM |
| ecosystem, essentially extending it with new tools. If such |
| repositories are tightly coupled with LLVM, it may make sense to |
| import them into your local mirror of the monorepo. |
| |
| If such repositories participated in the umbrella repository used |
| during the zipping process above, they will automatically be added to |
| the monorepo. For downstream repositories that don't participate in |
| an umbrella setup, the ``import-downstream-repo.py`` tool at |
| https://github.com/greened/llvm-git-migration/tree/import can help with |
| getting them into the monorepo. A recipe follows:: |
| |
| # Import downstream repo history into the monorepo. |
| git -C my-monorepo remote add myrepo https://my.local.mirror.org/myrepo.git |
| git fetch myrepo |
| |
| my_local_tags=( refs/tags/release |
| refs/tags/hotfix ) |
| |
| ( |
| cd my-monorepo |
| import-downstream-repo.py \ |
| refs/remotes/myrepo \ |
| ${my_local_tags[@]} \ |
| --new-repo-prefix=refs/remotes/upstream/monorepo \ |
| --subdir=myrepo \ |
| --tag-prefix="myrepo-" |
| ) |
| |
| # Preserve release branches. |
| for ref in $(git -C my-monorepo for-each-ref --format="%(refname)" \ |
| refs/remotes/myrepo/release); do |
| branch=${ref#refs/remotes/myrepo/} |
| git -C my-monorepo branch --no-track myrepo/${branch} ${ref} |
| done |
| |
| # Preserve master. |
| git -C my-monorepo branch --no-track myrepo/master refs/remotes/myrepo/master |
| |
| # Merge master. |
| git -C my-monorepo checkout local/zip/master # Or local/octopus/master |
| git -C my-monorepo merge myrepo/master |
| |
| You may want to merge other corresponding branches, for example |
| ``myrepo`` release branches if they were in lockstep with LLVM project |
| releases. |
| |
| ``--tag-prefix`` tells ``import-downstream-repo.py`` to rename |
| annotated tags with the given prefix. Due to limitations with |
| ``fast_filter_branch.py``, unannotated tags cannot be renamed |
| (``fast_filter_branch.py`` considers them branches, not tags). Since |
| the upstream monorepo had its tags rewritten with an "llvmorg-" |
| prefix, name conflicts should not be an issue. ``--tag-prefix`` can |
| be used to more clearly indicate which tags correspond to various |
| imported repositories. |
| |
| Given this repository history:: |
| |
| R1 - R2 - R3 <- master |
| ^ |
| | |
| release/1 |
| |
| The above recipe results in a history like this:: |
| |
| U1 - U2 - U3 <- upstream/master |
| \ \ \ |
| \ -----\--------------- local/zip--. |
| \ \ \ | |
| - Lllvm1 - Llld1 - UM3 - Lclang1 - Lclang2 - Lllvm2 - Llld2 - Lmyproj1 - M1 <-' |
| / |
| R1 - R2 - R3 <-. |
| ^ | |
| | | |
| myrepo-release/1 | |
| | |
| myrepo/master--' |
| |
| Commits ``R1``, ``R2`` and ``R3`` have trees that *only* contain blobs |
| from ``myrepo``. If you require commits from ``myrepo`` to be |
| interleaved with commits on local project branches (for example, |
| interleaved with ``llvm1``, ``llvm2``, etc. above) and myrepo doesn't |
| appear in an umbrella repository, a new tool will need to be |
| developed. Creating such a tool would involve: |
| |
| 1. Modifying ``fast_filter_branch.py`` to optionally take a |
| revlist directly rather than generating it itself |
| |
| 2. Creating a tool to generate an interleaved ordering of local |
| commits based on some criteria (``zip-downstream-fork.py`` uses the |
| umbrella history as its criterion) |
| |
| 3. Generating such an ordering and feeding it to |
| ``fast_filter_branch.py`` as a revlist |
| |
| Some care will also likely need to be taken to handle merge commits, |
| to ensure the parents of such commits migrate correctly. |
| |
| Scrubbing the Local Monorepo |
| ---------------------------- |
| |
| Once all of the migrating, zipping and importing is done, it's time to |
| clean up. The python tools use ``git-fast-import`` which leaves a lot |
| of cruft around and we want to shrink our new monorepo mirror as much |
| as possible. Here is one way to do it:: |
| |
| git -C my-monorepo checkout master |
| |
| # Delete branches we no longer need. Do this for any other branches |
| # you merged above. |
| git -C my-monorepo branch -D local/zip/master || true |
| git -C my-monorepo branch -D local/octopus/master || true |
| |
| # Remove remotes. |
| git -C my-monorepo remote remove upstream/monorepo |
| |
| for p in ${my_projects[@]}; do |
| git -C my-monorepo remote remove upstream/split/${p} |
| git -C my-monorepo remote remove local/split/${p} |
| done |
| |
| git -C my-monorepo remote remove localrepo |
| git -C my-monorepo remote remove umbrella |
| git -C my-monorepo remote remove myrepo |
| |
| # Add anything else here you don't need. refs/tags/release is |
| # listed below assuming tags have been rewritten with a local prefix. |
| # If not, remove it from this list. |
| refs_to_clean=( |
| refs/original |
| refs/remotes |
| refs/tags/backups |
| refs/tags/release |
| ) |
| |
| git -C my-monorepo for-each-ref --format="%(refname)" ${refs_to_clean[@]} | |
| xargs -n1 --no-run-if-empty git -C my-monorepo update-ref -d |
| |
| git -C my-monorepo reflog expire --all --expire=now |
| |
| # fast_filter_branch.py might have gc running in the background. |
| while ! git -C my-monorepo \ |
| -c gc.reflogExpire=0 \ |
| -c gc.reflogExpireUnreachable=0 \ |
| -c gc.rerereresolved=0 \ |
| -c gc.rerereunresolved=0 \ |
| -c gc.pruneExpire=now \ |
| gc --prune=now; do |
| continue |
| done |
| |
| # Takes a LOOOONG time! |
| git -C my-monorepo repack -A -d -f --depth=250 --window=250 |
| |
| git -C my-monorepo prune-packed |
| git -C my-monorepo prune |
| |
| You should now have a trim monorepo. Upload it to your git server and |
| happy hacking! |
| |
| References |
| ========== |
| |
| .. [LattnerRevNum] Chris Lattner, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041739.html |
| .. [TrickRevNum] Andrew Trick, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041721.html |
| .. [JSonnRevNum] Joerg Sonnenberger, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041688.html |
| .. [MatthewsRevNum] Chris Matthews, http://lists.llvm.org/pipermail/cfe-dev/2016-July/049886.html |
| .. [statuschecks] GitHub status-checks, https://help.github.com/articles/about-required-status-checks/ |