Building on this question, I have a workflow where I'm constantly making PRs on top of PRs to make it easier for others to review my work. Goal is to have smaller PR sizes. So I often end up with situations like the following:
G--H--I <-- branch3
/
D--E--F <-- branch2
/
A--B--C <-- branch1
/
M <-- master
And so on for N
branches after branch3
. The problem is, after I squash and merge branch1
, I have to manually rebase branches 2, 3...N:
G--H--I <-- branch3
/
D--E--F <-- branch2
/
A--B--C
/
M--S <-- master, origin/master (branch1 changes are squashed in S)
In the above case, I have to run:
git checkout branch2 git rebase --onto master (SHA-1 of C)
git checkout branch3 git rebase --onto branch2 (SHA-1 of F)
And so on...
Is there a way to automate this process by rebasing all branches automatically with a script? What I can't figure out is a way to automatically detect the correct SHA-1 to pass as parameter for each rebase.
There are a couple of fundamental problems, or maybe one fundamental problem, depending on how you look at it. That is:
Let's start with a question that seems straightforward, but because Git is Git, is actually a trick question: which branch holds commits
A-B-C
?There isn't a general solution to this problem. If you have exactly the situation you have drawn, however, there is a specific solution to your specific situation—but you'll have to write it yourself.
The answer to the trick question is that commits
A-B-C
are on every branch exceptmaster
. A branch name likebranch3
just identifies one particular commit, in this case commitI
. That commit identifies another commit, in this case, commitH
. Each commit always identifies some previous commit—or, in the case of a merge commit, two or more previous commits—and Git simply works backwards from the end. "The end" is precisely that commit whose hash ID is stored in the branch name.Branch names lack parent/child relationships because every branch name can be moved or destroyed at any time without changing the hash ID stored in each other branch. New names can be created at any time too: the only constraint on creating a new name is that you must pick some existing commit for that name to point-to.
The commits have parent/child relationships, but the names do not. This leads to the solution to this specific situation, though. If commit Y is a descendant of commit X, that means there's some backwards path where we start at Y and can work our way back to X. This relationship is ordered—mathematically speaking, it forms a partial order over the set of commits—so that X ≺ Y (X precedes Y, i.e., X is an ancestor of Y), then Y ≻ X (Y succeeds X: Y is a descendant of X).
So we take our set of names, translate each name to a commit hash ID, and perform these is-ancestor tests. Git's "is-ancestor" operator actually tests for ≼ (precedes or is equal to), and the is-equal case occurs with:
where both names select the same commit. If that could occur we would have to analyze what our code might do with that case. It turns out that this usually doesn't require any special work at all (though I won't bother proving this).
Having found the "last" commit—the one for which every commit comes "before" the commit in question—we now need to do our rebase operation. We have:
just as you showed, and we know that
S
represents theA-B-C
sequence because we picked commitC
(via the namebranch1
) when we madeS
. Since the last commit is commitI
, we want to copy—as rebase does—every commit fromD
throughI
, with the copies landing afterS
. It might be best if Git didn't move any of these branch names at all, during the copying operation, and we can get that to happen using Git's detached HEAD mode:or:
or:
which gets us:
We now run
git rebase --onto master branch1
if the namebranch1
is still available, orgit rebase --onto master <hash-of-C>
if not. This copies everything as desired:Now all (?) we need to do is go back through those same sets of branch names and count how far they are along the chain of original commits. Because of the way Git works—backwards—we'll do this starting from wherever they end and working backwards to commit
C
. For this particular drawing, that's 3 forbranch2
and 6 forbranch3
. We count how many commits we copied as well, which is also of course 6. So we subtract 3 from 6 forbranch2
, and 6 from 6 forbranch3
. That tells us where we should move those branch names now: zero steps back fromI'
forbranch3
, and three steps back fromI'
forbranch2
. So now we make one last loop through each name and re-set each name as appropriate.(Then we probably should pick some name to
git checkout
orgit switch
to.)There are some challenges here:
Where did we get this set of names? The names are
branch1
,branch2
,branch3
, and so on, but in reality they won't be so obviously related: why do we move branchfred
but not branchbarney
?How did we know that
branch1
is the one that we shouldn't use here, but should use as the "don't copy this commit" argument to ourgit rebase
-with-detached-HEAD?How exactly do we do this is-ancestor / is-descendant test?
This question actually has an answer:
git merge-base --is-ancestor
is the test. You give it two commit hash IDs and it reports whether the left-hand one is an ancestor of the right-hand one:git merge-base --is-ancestor X Y
testsX ≼ Y
. Its result is its exit status, suitable for use in shell scripts with theif
built in.How do we count commits?
This question also has an answer:
git rev-list --count stop..start
starts at thestart
commit and works backwards. It stops working backwards when it reachesstop
or any of its ancestors. It then reports a count of the number of commits visited.How do we move a branch name? How do we figure out which commit to land on?
This one is easy:
git branch -f
will let us move an existing branch name, as long as we do not have that name currently checked-out. As we are on a detached HEAD after the copying process, we have no name checked-out, so all names can be moved. Git itself can do the counting-back, using the tilde and numeric suffix syntax:HEAD~0
is commitI'
,HEAD~1
is commitH'
,HEAD~2
is commitG'
,HEAD~3
is commitF'
, and so on. Given a number$n
we just writeHEAD~$n
, sogit branch -f $name HEAD~$n
does the job.You still have to solve the first two questions. The solution to that will be specific to your particular situation.
Worth pointing out, and probably the reason no one has written a proper solution for this—I wrote my own approximate solution many years ago but abandoned it many years ago as well—is that this whole process breaks down if you don't have this very specific situation. Suppose that instead of:
you begin with:
This time, ending at commit
I
and copying all commits that reach back through, but do not include, commitC
fails to copy commitF
. There is noF'
to allow you to move branch namebranch2
after copyingD-E-G-H-I
toD'-E'-G'-H'-I'
.This problem was pretty major, back in the twenty-aughts and twenty-teens. But
git rebase
has been smartened up a bunch, with the newfangled-r
(--rebase-merges
) interactive rebase mode. It now has almost all the machinery for a multi-branch rebase to Just Work. There are a few missing pieces that are still kind of hard here, but if we can solve the first two problems—how do we know which branch names to multi-rebase in the first place—we could write agit multirebase
command that would do the whole job.