Why should we never use rebase with commits that have been pushed

4.9k views Asked by At

Im still getting my feet wet with VSTS and Git. I understand the scenario where changes in the master branch need to get into the feature branch, but these "reminders" or tips dont make sense to me yet. What is meant by the statement below? https://learn.microsoft.com/en-us/vsts/git/tutorial/rebase?tabs=visual-studio

[quote]

Never rebase commits that have been pushed and shared with others. The only exception to this rule is when you are certain no one on your team is using the commits or the branch you pushed.

After reading a bit further, and coming from SVN, I think I see why the above statement was made:

Never force push a branch that others are working on. Only force push branches that you alone work with.

This would be "similar" to a situation with SVN where:(1) you have a local branch that others are working on and then (2) make a bug fix directly in trunk, then (3) merge those changes down to your local branch and (4) commit the updates in the local branch, thus forcing anyone else working on that local branch to get the update and potentially have a merge conflict

3

There are 3 answers

0
torek On BEST ANSWER

Ashish Mathew's answer, which links to and quotes from the Pro Git book, is correct, but you may need a lot more background to really understand it. But I'd like to start out by saying that the word "never" is too strong. It's OK to rebase published commits under one condition: that everyone who will have to deal with the problems created, has agreed in advance to deal with the problems created.

But what, then, are the problems created by doing this? The answer is in that quote: rebase works by copying commits.

Git has one real "true name" for each commit, which is that commit's hash ID. That true name—the hash ID—is how Git finds the underlying data, and how, when you connect two Gits to each other, they transfer the data. (In fact, these hash IDs are used for all four of Git's internal object types, though you yourself will mostly deal with commits.)

The hash ID for any given commit is unique, and apparently totally random—but in fact, it's completely deterministic, having been computed from the data inside the commit. (It's a cryptograph hash of that data.) Hence your Git can connect to any other Git anywhere in the entire universe, and if your Git waves a raw hash ID like 8279ed033f703d4115bee620dccd32a9ec94d9aa at the other Git, the two Gits can immediately tell whether they both have that commit, or not. If both Gits have the commit, there's nothing to do; but if only one Git has the commit, the other Git will ask to get a copy.

(The transfer is always one way: git fetch has your Git call up another Git and download items from them, while git push has your Git call up another Git and send items to them. There's no fundamental reason you couldn't do both at the same time, but the commands are all written with unidirectional transfer in mind.)

This ability to do a very simple have/want exchange is how Git can rapidly transfer only the necessary objects: even if you have a fairly fat repository such as that for the Linux kernel—weighing in, today, at over 700k commits and about 2.4 GB of repository database—the git fetch command is quick:

$ time git fetch

real    0m0.457s
user    0m0.228s
sys 0m0.087s
$ 

(I ran an earlier git fetch this morning, which was a lot slower as I had not updated this copy of the kernel since late last year. That one took about 3 seconds of CPU time and about 10.5 seconds of real time, to bring over 11853 objects.)

Anyway, the short version of all of this is that Git tends to be like the Borg of source control systems: whatever you have, when I connect my Git to yours, I add everything you have that I don't to my repository. I keep everything I had before!

So, if you use git rebase on published commits—commits that I have now, because I got them from you earlier—you will, as in the quote, copy some of your existing commits to new commits that you think are new-and-improved. You will then switch to the new commits, abandoning the yucky old ones that you've copied. When I connect my Git to your Git, I'll end up with both the old and the new.

That doesn't seem so bad—but the problem is that my Git treats all commits in my repository as precious, so now I have both the old ones and the new ones. I haven't abandoned the old ones. If I've built new commits that use your old ones, I now have to somehow separate the work I've built from the work you've copied.

There are tools that can help with this—particularly git rebase --fork-point—but they're not the most wonderful things ever. They need to be used fairly quickly, right after I pick up your rebased commits, to be effective. So I will need to know, preferably in advance, that you will be rebasing your published work—your commits that I already have—so that I am prepared to do anything I must do to rebase my work on your rebased work.

If we have all agreed to this in advance, and we all know how to do it, then it is OK to rebase your published commits. If not, well, you might be making a lot of work for someone else, perhaps many "someone else"s, who may not know how to use the (not so great) tools for dealing with an "upstream rebase".

0
Josh E On

A Rebase operation is a potentially "destructive" one because precisely because it is so powerful. The power of rebase comes from the fact that it can rewrite the commit history of a particular branch. This doesn't seem like a big deal until you consider what happens in any situation where two or more people disagree on a historical record.

When you push a branch up to the server, you're not just pushing the current state of the branch, you're pushing its history as well. Any future merge or branching will take this history into account when applying/comparing changes.

If you rebase a branch that's already been pushed to the server, anyone who had already pulled the branch will now be unable to plot a clean path from their commits to yours. Thus, they will be faced with inconsistent histories and merge conflicts rippling across the repository

The unfortunate souls dealing with this will likely not have warm and fuzzy feelings towards you, because you've just cost them a non-trivial amount of time and effort.

0
Ashish Mathew On

From the Pro Git book.

When you rebase stuff, you’re abandoning existing commits and creating new ones that are similar but different. If you push commits somewhere and others pull them down and base work on them, and then you rewrite those commits with git rebase and push them up again, your collaborators will have to re-merge their work and things will get messy when you try to pull their work back into yours.