|
|
@@ -0,0 +1,370 @@
|
|
|
+# Remove LLVM from the repository, and clean up history.
|
|
|
+
|
|
|
+<!--
|
|
|
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
|
|
|
+Exceptions. See /LICENSE for license information.
|
|
|
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
|
|
+-->
|
|
|
+
|
|
|
+[Pull request](https://github.com/carbon-language/carbon-lang/pull/1344)
|
|
|
+
|
|
|
+<!-- toc -->
|
|
|
+
|
|
|
+## Table of contents
|
|
|
+
|
|
|
+- [Problem](#problem)
|
|
|
+- [Proposal](#proposal)
|
|
|
+- [Details](#details)
|
|
|
+ - [Implementation](#implementation)
|
|
|
+ - [Patching LLVM](#patching-llvm)
|
|
|
+ - [Updating forks and clones](#updating-forks-and-clones)
|
|
|
+ - [Archive your existing fork and clones](#archive-your-existing-fork-and-clones)
|
|
|
+ - [Create a fresh fork and clone](#create-a-fresh-fork-and-clone)
|
|
|
+ - [Porting in-progress branches from your archived clone](#porting-in-progress-branches-from-your-archived-clone)
|
|
|
+ - [Updating pending PRs](#updating-pending-prs)
|
|
|
+ - [Review comments may be disrupted](#review-comments-may-be-disrupted)
|
|
|
+- [Rationale](#rationale)
|
|
|
+- [Alternatives considered](#alternatives-considered)
|
|
|
+ - [Do nothing](#do-nothing)
|
|
|
+ - [Don't rewrite the repository history.](#dont-rewrite-the-repository-history)
|
|
|
+ - [Go back to submodules](#go-back-to-submodules)
|
|
|
+ - [Rename the repository, and create a new one](#rename-the-repository-and-create-a-new-one)
|
|
|
+ - [Manually extract and archive some review comments](#manually-extract-and-archive-some-review-comments)
|
|
|
+
|
|
|
+<!-- tocstop -->
|
|
|
+
|
|
|
+## Problem
|
|
|
+
|
|
|
+We have tried both Git submodules and subtrees to manage accessing the LLVM
|
|
|
+source code as part of Carbon. We have had significant issues with both of these
|
|
|
+over time.
|
|
|
+
|
|
|
+Git submodules are well supported in general, but make the repository
|
|
|
+significantly less user-friendly. There are a number of common operations, not
|
|
|
+least the initial clone of the repository, that are made frustratingly more
|
|
|
+complex in the presence of a submodule. In some cases (switching branches), it
|
|
|
+can even cause hard to diagnose errors for users. It also doesn't directly help
|
|
|
+with carrying local patches. Because of all of these, we switched to subtrees.
|
|
|
+
|
|
|
+Git subtrees in some ways work much better once set up as they are simply a
|
|
|
+normal directory for most users. They also make it especially easy to make and
|
|
|
+carry local patches. However, setting up the subtree and updating the subtree
|
|
|
+are bug prone and interact very poorly with GitHub's pull request workflow. The
|
|
|
+result is that updating LLVM is currently both very difficult and error prone.
|
|
|
+
|
|
|
+Another fundamental problem subtrees expose is that the LLVM repository is
|
|
|
+_huge_. It includes large and complex projects that Carbon is unlikely to ever
|
|
|
+use, and it doesn't make sense for us to start our repository off with all of
|
|
|
+that history and space if we can avoid it. Including LLVM causes things like
|
|
|
+`git status` or cloning the repository to be dramatically more expensive without
|
|
|
+significant benefit.
|
|
|
+
|
|
|
+## Proposal
|
|
|
+
|
|
|
+We should switch to using Bazel to download a snapshot of LLVM during the build,
|
|
|
+and do a destructive history re-write to the Carbon repository to completely
|
|
|
+remove LLVM from it. All of the building against LLVM will continue to work
|
|
|
+seamlessly. However, the repository will become tiny and fast to work with, and
|
|
|
+the snapshot download is significantly cheaper than cloning or working with the
|
|
|
+full-history of LLVM.
|
|
|
+
|
|
|
+The only way to capture the benefit here is to do a destructive update to the
|
|
|
+repository's history. This is unfortunate, but essential to do as soon as
|
|
|
+possible and before we shift to be public. Despite the cost of making this
|
|
|
+change, the cost of _not_ making it will grow without bound for the life of the
|
|
|
+project.
|
|
|
+
|
|
|
+**Important:** This _will_ require a manual update of some kind for every fork
|
|
|
+and clone. The steps to update them are provided below.
|
|
|
+
|
|
|
+## Details
|
|
|
+
|
|
|
+### Implementation
|
|
|
+
|
|
|
+<!-- google-doc-style-ignore -->
|
|
|
+
|
|
|
+This will be accomplished using the
|
|
|
+[`git-filter-repo` tool](https://github.com/newren/git-filter-repo). This makes
|
|
|
+it very easy to prune any trees from the history, replaying the history left and
|
|
|
+resulting in a clean and clear result.
|
|
|
+
|
|
|
+<!-- google-doc-style-resume -->
|
|
|
+
|
|
|
+We will remove three other trees while here that are no longer used:
|
|
|
+
|
|
|
+- `src/jekyll/`
|
|
|
+- `src/firebase/`
|
|
|
+- `website/`
|
|
|
+
|
|
|
+Once we're taking a history break, we should capture the value we can and have a
|
|
|
+minimal history of the repository we are actually using. The largest of these is
|
|
|
+`src/jekyll` (988kb even packed), but it seems cleanest to remove the three
|
|
|
+together.
|
|
|
+
|
|
|
+### Patching LLVM
|
|
|
+
|
|
|
+We still need to support patching LLVM. Initially, we will start with a simple
|
|
|
+approach of applying a collection of patch files from the Carbon repository to
|
|
|
+the LLVM snapshot. This is especially convenient while Carbon's repositories are
|
|
|
+not public. Eventually, we can switch to having a fork of LLVM that we snapshot
|
|
|
+and developing any needed patches there. Having a more robust long-term solution
|
|
|
+is expected to be important if larger scale development is needed on Clang when
|
|
|
+building interoperability support.
|
|
|
+
|
|
|
+<!-- google-doc-style-ignore -->
|
|
|
+
|
|
|
+Developing patches is also very easy when using Bazel. You can pass
|
|
|
+`--override_repository=llvm-project=<directory>`
|
|
|
+([docs](https://bazel.build/reference/command-line-reference#flag--override_repository))
|
|
|
+to have Bazel use a local checkout of LLVM (with any patches you are testing)
|
|
|
+rather than downloading it. If doing significant development, this can be added
|
|
|
+to your `user.bazelrc` file to consistently use your development area.
|
|
|
+
|
|
|
+<!-- google-doc-style-resume -->
|
|
|
+
|
|
|
+### Updating forks and clones
|
|
|
+
|
|
|
+If you have a fork and any clones of the repository with work that you want to
|
|
|
+save, we suggest the following process:
|
|
|
+
|
|
|
+#### Archive your existing fork and clones
|
|
|
+
|
|
|
+1. Go to the archived repository:
|
|
|
+ https://github.com/carbon-language/archived-carbon-lang
|
|
|
+
|
|
|
+2. Fork this archived repository just like you normally would (into your
|
|
|
+ _personal_ space). It is important to fork the `carbon-language` hosted
|
|
|
+ archived repository so you pick up the ACLs. You should **not** create your
|
|
|
+ own personal repository.
|
|
|
+
|
|
|
+3. Update any clones to use these archived remotes to avoid accidentally trying
|
|
|
+ to interact with the new repository. Example commands assuming the remote
|
|
|
+ `origin` points to your fork and and `upstream` points to the main
|
|
|
+ repository. You may also need to adjust from the SSH-style URLs to HTTPS
|
|
|
+ ones to match:
|
|
|
+
|
|
|
+ ```
|
|
|
+ git remote set-url upstream git@github.com:carbon-language/archived-carbon-lang.git git@github.com:carbon-language/carbon-lang.git
|
|
|
+ git remote set-url origin git@github.com:$USER/archived-carbon-lang.git git@github.com:$USER/carbon-lang.git
|
|
|
+ ```
|
|
|
+
|
|
|
+4. For each clone, mirror everything into the new `origin` (your fork):
|
|
|
+
|
|
|
+ ```
|
|
|
+ git push --mirror origin
|
|
|
+ ```
|
|
|
+
|
|
|
+ If you have multiple clones with different but overlapping branches, this
|
|
|
+ may require some extra steps to avoid collisions when doing the mirror push.
|
|
|
+
|
|
|
+At this point, your clones should all point to an archival fork and should be
|
|
|
+able to function "normally" cleanly. The goal here is just archival and making
|
|
|
+sure you don't lose anything.
|
|
|
+
|
|
|
+#### Create a fresh fork and clone
|
|
|
+
|
|
|
+Delete your existing fork of `carbon-lang` (_not_ `archived-carbon-lang`):
|
|
|
+
|
|
|
+1. Go to your fork's settings page:
|
|
|
+ `https://github.com/$USER/carbon-lang/settings`. Be certain this is your
|
|
|
+ _fork_ and not the `carbon-language` organization.
|
|
|
+2. Scroll to the bottom, and click `Delete this repository`.
|
|
|
+
|
|
|
+Once deleted, create a fresh fork from the now-filtered main repository:
|
|
|
+https://github.com/carbon-language/carbon-lang/fork
|
|
|
+
|
|
|
+Clone this fork and you're ready to go with the new repository structure.
|
|
|
+
|
|
|
+#### Porting in-progress branches from your archived clone
|
|
|
+
|
|
|
+For any branch in your archive clones that you want to import, there are two
|
|
|
+approaches.
|
|
|
+
|
|
|
+The simplest way is to extract the branch as a patch file and apply it in the
|
|
|
+new repository:
|
|
|
+
|
|
|
+```
|
|
|
+# If your old clone is in the `archived-carbon-lang` directory and your new
|
|
|
+# clone is in `carbon-lang`, both of them on the `trunk` branch:
|
|
|
+cd archived-carbon-lang
|
|
|
+git switch $BRANCH
|
|
|
+git diff upstream/trunk... > ../$BRANCH.patch
|
|
|
+cd ../carbon-lang
|
|
|
+git switch -c $BRANCH
|
|
|
+patch -p1 < ../$BRANCH.patch
|
|
|
+git commit
|
|
|
+```
|
|
|
+
|
|
|
+This will just flatten the branch into a single commit in the new clone.
|
|
|
+
|
|
|
+If you want to preserve the commit history, you can do that using
|
|
|
+`git format-patch`. Some points to remember here:
|
|
|
+
|
|
|
+- Make sure you've rebased the branch into a clean series of commits on top of
|
|
|
+ the latest trunk in the archived repository.
|
|
|
+- Use `git format-patch` to get a series of patches in a directory. See its
|
|
|
+ help for detailed usage.
|
|
|
+- Use `git am` to apply the series of patches in your new clone under some
|
|
|
+ branch. Again, see its help for detailed usage.
|
|
|
+
|
|
|
+#### Updating pending PRs
|
|
|
+
|
|
|
+The pending PRs should end up closed automatically when you delete your fork
|
|
|
+above. If not, the simplest approach is to close them and just create a new PR.
|
|
|
+In-flight discussion comment threads will be awkward, but there isn't much else
|
|
|
+to do. Rename the proposal document if needed.
|
|
|
+
|
|
|
+We will explore whether we can re-open PRs with a fresh forced push from the
|
|
|
+fresh fork, but aren't confident in this working. It also isn't necessary, and
|
|
|
+we only have 12 PRs open at the moment.
|
|
|
+
|
|
|
+### Review comments may be disrupted
|
|
|
+
|
|
|
+When we edit the repository, some PR comments (especially pending PRs that are
|
|
|
+updated to follow a freshly created fork) may lose their association with the
|
|
|
+line of code they were made against, and it is possible we may be unable to find
|
|
|
+some comments. This will at most apply to inline comments within the code of
|
|
|
+PRs, but that is the common case for PRs. We expect most of these to still be
|
|
|
+somewhat visible in the conversation view of the pull request, but they may be
|
|
|
+hard to find due to no longer being attached to a line.
|
|
|
+
|
|
|
+## Rationale
|
|
|
+
|
|
|
+- [Community and culture](/docs/project/goals.md#community-and-culture)
|
|
|
+ - This will make it even easier and smoother for folks to clone, build,
|
|
|
+ and even contribute.
|
|
|
+ - It will make clones significantly cheaper, especially those not building
|
|
|
+ code but just working on documentation.
|
|
|
+- [Language tools and ecosystem](/docs/project/goals.md#language-tools-and-ecosystem)
|
|
|
+ - Preserving the ability to build with LLVM and develop patches will be
|
|
|
+ important as we expand the set of language tools and ecosystem being
|
|
|
+ developed.
|
|
|
+
|
|
|
+## Alternatives considered
|
|
|
+
|
|
|
+### Do nothing
|
|
|
+
|
|
|
+We could simply continue using the moderately broken Git subtree approach.
|
|
|
+
|
|
|
+Advantages:
|
|
|
+
|
|
|
+- No need to change our approach.
|
|
|
+- No destructive operation on the repository requiring everyone to update.
|
|
|
+
|
|
|
+Disadvantages:
|
|
|
+
|
|
|
+- Every update of LLVM remains difficult, manual, and error prone.
|
|
|
+ - We're currently carrying an LLVM patch, and it's easy to lose (and has
|
|
|
+ been lost) in updates.
|
|
|
+- Will continually hit bugs in Git subtree as it seems both brittle and not a
|
|
|
+ priority. We will have to struggle to understand and fix or work around
|
|
|
+ them.
|
|
|
+- The repository remains massive, slow, and contains significant LLVM code
|
|
|
+ that we will never use.
|
|
|
+- If we ever have users that wish to import the Carbon repository into some
|
|
|
+ other environment, they will have to pay the cost of LLVM or remove it
|
|
|
+ somehow. It may even cause them to have two copies of LLVM.
|
|
|
+
|
|
|
+We think this problem is worth solving.
|
|
|
+
|
|
|
+### Don't rewrite the repository history.
|
|
|
+
|
|
|
+We could fix this without rewriting history. If we choose not to rewrite history
|
|
|
+now, it should be noted that the cost of rewriting history only grows and so we
|
|
|
+should expect to _never_ rewrite history.
|
|
|
+
|
|
|
+Advantages:
|
|
|
+
|
|
|
+- No need for manual steps to update forks, clones, in-flight patches, etc.
|
|
|
+- Commit hashes remain stable.
|
|
|
+- All code review comment information associated with commit hashes will be
|
|
|
+ retained.
|
|
|
+
|
|
|
+Disadvantages:
|
|
|
+
|
|
|
+- We pay the cost of having imported LLVM forever. Even if this cost is
|
|
|
+ incrementally small (some amount of repository space when cloning with
|
|
|
+ history), it is a cost we will never stop paying.
|
|
|
+
|
|
|
+While the immediate costs are high, the unbounded time frame for which we will
|
|
|
+pay for leaving LLVM in the history means that will eventually be the dominant
|
|
|
+cost and we should just rewrite history, and the sooner the better.
|
|
|
+
|
|
|
+### Go back to submodules
|
|
|
+
|
|
|
+Rather than switching to use Bazel downloaded snapshots, we could go back to
|
|
|
+using Git submodules. The original motivation to move away from submodules was
|
|
|
+needing to carry a local patch. Submodules and the proposed direction have the
|
|
|
+same options there, and our approach is discussed above.
|
|
|
+
|
|
|
+Advantages:
|
|
|
+
|
|
|
+- Somewhat more of a Git-native approach.
|
|
|
+- Would avoid needing to invent another approach if we add a non-Bazel build
|
|
|
+ system.
|
|
|
+
|
|
|
+Disadvantages:
|
|
|
+
|
|
|
+- Much of the cloning cost and other Git command costs we see with subtrees
|
|
|
+ would still be present.
|
|
|
+- It makes working with the Git repository even more tricky to get right.
|
|
|
+- While less esoteric than subtrees, it still exposes a less polished surface
|
|
|
+ of Git that we will have to cope with.
|
|
|
+- Most notably, this will re-introduce the stumbling block of users first
|
|
|
+ encountering Carbon not having a seamless experience.
|
|
|
+
|
|
|
+We think we should try something simpler here, even if a bit less of a
|
|
|
+native-Git solution. It is also especially important for us to optimize the
|
|
|
+initial new contributor flow, and the Bazel approach is expected to be more
|
|
|
+seamless.
|
|
|
+
|
|
|
+### Rename the repository, and create a new one
|
|
|
+
|
|
|
+Rather than editing the repository in place, this would move it aside and create
|
|
|
+a new one with the edited history.
|
|
|
+
|
|
|
+Advantages:
|
|
|
+
|
|
|
+- Some disruptive aspects of the in-place history edit would be avoided,
|
|
|
+ specifically parts of PRs that are associated with commit hashes would
|
|
|
+ likely continue working due to the commit hashes being preserved.
|
|
|
+
|
|
|
+Disadvantages:
|
|
|
+
|
|
|
+- For most cases, this would be equally disruptive -- forks and clones would
|
|
|
+ have the same update needed as they reference the repository by name.
|
|
|
+- We would either need to build a system to carefully re-create the exact
|
|
|
+ issue numbering and PR numbering, which may not even be possible, or we
|
|
|
+ would need to allow all issue numbers and PR numbers to churn.
|
|
|
+ - The PR numbers churning is especially disruptive as the commit log
|
|
|
+ references them to connect commits to the code review that led to the
|
|
|
+ commit. They are also the basis of the proposal numbers.
|
|
|
+ - There isn't likely a way to move the PRs back to the main repository, so
|
|
|
+ browsing the historical code reviews would become problematic.
|
|
|
+
|
|
|
+This seems to largely trade off preservation of some of the comment history
|
|
|
+within PRs for preserving the location, numbers, and links of all PRs and
|
|
|
+issues. That doesn't seem like the right tradeoff.
|
|
|
+
|
|
|
+### Manually extract and archive some review comments
|
|
|
+
|
|
|
+We could attempt to extract and archive review comments in case they are lost or
|
|
|
+made hard to find by the change.
|
|
|
+
|
|
|
+Advantages:
|
|
|
+
|
|
|
+- Defense in depth against any information loss here.
|
|
|
+
|
|
|
+Disadvantages:
|
|
|
+
|
|
|
+- Would be a decent amount of work to get this right.
|
|
|
+- May not lose much information here. We think the comments will still be in
|
|
|
+ the conversation view, but maybe hard to find.
|
|
|
+- Unclear whether any of these will actually have value, especially extracted
|
|
|
+ and out of context from the review where they were made.
|
|
|
+- Serious concerns about arbitrarily scraping and moving comments other folks
|
|
|
+ authored outside of the system (GitHub) that they authored them within.
|
|
|
+
|
|
|
+Overall, while there is some risk here, we don't think it is too high and the
|
|
|
+cost of trying to mitigate this seems unreasonable given the relatively small
|
|
|
+total amount of data at issue here.
|