trying to use git fetch to update my non-master branch to source

797 views Asked by At

I have used the steps outlined here to successfully update my master branch of the fork. So the master branch of the fork is now even with the original source's master.

I have several different branches and I wanted to make one of them (called new_branch) also even with the original source's master. So I modified the steps outlined at the link the following way.

git fetch upstream (step 4 at the link)

git checkout new_branch (step 5)

git merge upstream/new_branch (step 6)

Step 6 produces merge: upstream/new_branch - not something we can merge in the terminal

I still went ahead with the next step.

git push origin new_branch (step 7)

After step 7 all I get is Everything up-to-date. However, github branch `new_branch' still says it is 41 commits behind the source of the fork.

Is it not possible to bring your non-master branch up to date with the source of the fork?

*I ran git fetch and git branch -r to see what I have. (I did run git fetch before though)

enter image description here

1

There are 1 answers

0
torek On

TL;DR

You need to rebase your commits, typically using upstream/master. You may then need to use git push --force-with-lease or similar to update your origin's new_branch, which is your origin/new_branch. See the long answer for details.

Long

This is all fairly complicated, so here is a capsule summary:

  • There are three Git repositories involved here. We can call them Repo-A, Repo-B, and Repo-C. Two have a URL, though, over on github.com. For sanity in terms of referring to each of the three repositories, let's use their names as seen from your repository, on your computer:
    • (no name): the local repository, which we'll just call laptop when we need a name;
    • origin: a GitHub repository to which you can write directly; and
    • upstream: a GitHub repository to which you can't write, but where you can generate a "pull request" using GitHub's ability to do that.
  • Each repository has its own branches, but all repositories share the commits—at least, those that they've seen from other Git repositories.
  • We find a commit by its hash ID. The hash ID (or object ID in Git terms), in a sense, is the commit. But these things are not user friendly.
  • So, we (humans) find a hash ID using a name. But the names are not shared across repositories. At most they're copied. The sharing, if any, is a result of updating some name. So, in an important sense, they're not shared.
  • To transfer commits between laptop and origin, in whatever direction we like, we can use git fetch and git push.
  • To transfer commits between laptop and upstream, we're only allowed to use git fetch.

Now, keeping in mind that we find commits by some sort of name, let's note that there are lots of kinds of names. In your own repository on laptop, you have full and total control over all these names, so you can do anything you want. For your own sanity, there are certain patterns you'll want to follow.

In the repository named origin, you have rather less control over names, but we'll see how you can use git push to affect its branch names. In the repository named upstream you have essentially no control over names.

Note that I'm completely ignoring tags here. They complicate the picture a bit, and this is already quite long.

Fetch, push, and remote-tracking names

Let's talk now about git fetch and git push. I'm going to assume you know how commits work and how Git finds commits, by starting from the last commit in a branch and working backwards, and that when we git checkout some branch by name (or do the same thing with git switch) and then make new commits, Git updates the branch name to hold the new commit's hash ID. The new commit points back to the commit that was the last one in the branch, so the branch has automatically been extended by this operation. This all uses the word branch, which is quite ambiguous when using Git, in a rather cavalier and presumptive manner, assuming that the reader can figure out which of any number of different and perhaps even contradictory definitions of branch might apply.

What I won't assume here is that you know about remote-tracking names, since a lot of your original question hinges on these. So let's talk about these. Let's also define remote, since it's involved.

A remote is just a short name for another Git repository. We already see that we have two remotes in laptop: the name origin refers to your fork on GitHub, and the name upstream refers to another repository on GitHub, from which you created your fork. A remote always stores a URL, but it also acts as a prefix for these remote-tracking names.

A remote-tracking name—which Git calls a remote-tracking branch name, but look at how badly the word branch is already abused; let's give that poor word a rest here—is a name that your (laptop) Git creates and updates based on some branch name as seen in some other Git repository.

Remember, again, that a branch name—as seen in any Git repository, whether it's laptop, origin, or upstream—holds the hash ID of a tip commit. So no matter which of the three repositories we look at, we have some string of commits that ends with the most recent:

... <-F <-G <-H   <--branch

where H is the hash ID of the most recent commit on that branch. If this is all in your local laptop repository, you can see these commits with git log or some other commit viewer.

Your first goal is to get all the commits, which means git fetch

In order to fuss around with commits, we need to have them. You have total control over the repository on your laptop, so that's where we'd like to have them. We will get these commits with git fetch.

What git fetch needs is the name of the repository you want to get commits from. That is, we'll pick one of your remotes: origin or upstream. Your Git will use the stored URL to call up a Git over on GitHub and connect it to one of those repositories.

Like your local Git, the Git you call up—the corresponding repository, really, but let's just pretend that each Git is a person who does this sort of work—can see branch names and commit hash IDs. With git fetch, your Git asks them to report back to you all their branch names and those hash IDs. Your Git can therefore see their Git's name-and-ID pairs.

For each of these IDs, your Git can now check to see if you have this commit. If you do, great! There's nothing more to do. But if you don't, well, your Git wants this commit, and their Git is happy to give it to you. Their Git is obligated to offer you any commits that go with that commit (parents), and your Git can check to see if you have those, and ask for them too, or say no thanks, already have that one as appropriate. Since this repeats for every branch name that they offer,1 you wind up having them send you every commit that they have, that you don't. At the end of this "get all the commits" process, you now have all of their commits.

We noted earlier, though, that hash IDs are terrible for humans. That's why their Git has their branch names in the first place. It would be nice, would it not, if your Git could refer to their branch names too? So that's what your Git does: it copies their names, but changes them. You have your branch names, and they have theirs. Your Git doesn't dare overwrite your branch names with their information. Instead, your Git takes their branch names and turns them into your remote-tracking names.

To do this, your Git just shoves the name of the remote, and a slash, in front of their names.2 Their master becomes your origin/master, if we're having Git fetch from origin, or your upstream/master, if we're having Git fetch from upstream.

Once this is all done, your Git disconnects from their Git. Their Git can now update their branch names, and if and when they do, your Git's remote-tracking names are out of date. This does happen, all the time; to fix it, you just run git fetch again, to origin to update your origin/* names, and to upstream to update your upstream/* names. But while you're disconnected, you still have all the commits, and your Git remembers where their Git's branches are.


1If you set up a single-branch clone, your Git only asks them about that one branch, and hence you don't get everything. The point of a single-branch clone is ... well, generally to save space and time, but they don't save that much of either one on their own, so until you are really familiar with Git, avoid single-branch clones.

2Technically it's more complicated than this, which sort of helps if you accidentally name your own branch origin/xyz or something. But—just don't do that, and then you won't need to get into these technicalities.


When you need to send commits, you need git push

Let's say you ran git fetch origin and git fetch upstream, back to back, and now have all the commits. The Git your laptop is calling origin is under your control. This means you can send any new-to-them commits to origin.

Depending on who controls upstream and what settings they have made, though, you may not be able to send stuff directly to upstream. We'll worry about that later, because your next goal may be to update master over on origin—which your Git calls origin/master.

In fact, you already did this:

I have used the steps outlined [at https://medium.com/@sahoosunilkumar/how-to-update-a-fork-in-git-95a7daadc14e] to successfully update my master branch of the fork. So the master branch of the fork is now even with the original source's master.

Unfortunately, the stuff at that link is just a recipe. It doesn't explain why the listed commands are listed, leaving you in the situation of Cueball in this xkcd:

"No idea. Just memorize these shell commands...."

So, let's look closely at git push. While git push is as close as Git gets to the opposite of git fetch, they're not quite opposites. The biggest difference is that with git push, there is no notion of a remote-tracking name.

To use git push, you will often run it with two arguments:

  • the name of a remote: this supplies the URL, just as with git fetch; and
  • what Git calls a refspec.

The refspec is the hard part here. We can keep it simple, because a branch name works as a refspec. But we maybe shouldn't keep it this simple, because that neglects something very important.

Note that if you run git push without these two arguments, the effect depends on whether you have Git 1.x or Git 2.x. In Git 1.x, git push will probably try to push too much. In Git versions starting with Git 2.0, git push defaults to pushing just the current branch, which is usually a lot closer to what most users want.

The remote part, such as origin in git push origin, is easy. It's just the same as before: we're picking who to call up. The push command is going to send them commits, which is the obvious counterpart to git fetch, which got commits from them. What's really different is this final refspec thing.3 So we need to define what a refspec is, and how you write one, for git push.

The simplest form of a refspec is, as we already noted, just a branch name, such as master or new_branch. For git push, this is shorthand for master:master or new_branch:new_branch respectively.

The more complicated form here has a colon in the middle: master:master for instance. The colon separates your local name for some commit from the request you plan to make to the other Git.

Let's go on now to see how git push works:

  • First, your Git calls up their Git, just as before.

  • But now, instead of having them list out their branch names so that you can get all the commits they have that you don't, you have your Git list out one commit hash ID for them. That hash ID is from the left-hand side of your refspec.

  • They look at this hash ID and see if they have this commit. If they do, great! If not, your Git and their Git go through the same kind of conversation that happened before, to figure out what commits your Git needs to send them. Your Git will send them all the commits leading up to this commit as needed. They save those somewhere,4 then go on to the last part.

  • For the last part, your Git now gives them polite requests or forceful commands. These requests-or-commands are of the form:

    • Please, if it's OK, set your branch name ______ (fill in a name) to ______ (fill in a hash ID). Or:
    • Set your branch name ______ to _______! Or:
    • I think your branch name ______ contains hash ID ______. If so, set it to ______!

The branch name for this last request-or-command comes from the right side of the colon. The commit hash ID comes from the left side, just as with the commits-to-send. So git push master:master takes your master—whatever commit hash ID that is—and sends that hash ID over to their Git. Their Git determines whether they have that commit already, or not, and makes sure to get it if they need it. Then your Git asks them to set their branch name, master, to that hash ID.

There are a couple of important things to note here:

  • There's no concept of a remote-tracking name. They aren't going to modify the branch name you say! If you say master, they will set their master. Or maybe they won't, because ...
  • They can say no. For a polite request, they'll check first, that your request merely adds new commits, or at least doesn't throw any out. If that's not the case, they will say no. You'll see this as a "rejected (non-fast-forward).
  • Even for a forceful command, they can still say no. If you own the GitHub repository, they will usually accept a forceful command, but GitHub added a bunch of controls to Git so that you can have them say no. And, if you don't own the GitHub repository, even more controls apply.
  • The last form is a sort of conditional command: I think; if so, I command. So they can say no to that one, with an error of the form: "You were wrong." Because this answer is long, I'm not going to go into detail on this part.

In any case, if they do say OK, I have made the branch-name change, or created a new branch name, as you requested/commanded, at this point, your Git says: Aha, I should now update my remote-tracking name. So if you've convinced origin, for instance, to update their master, your Git will now update your origin/master the same way.


3There is a bunch of history here. Skipping over all the history, we leave a puzzle: why do we use refspecs with push, but not with fetch? So let's fill it in a bit.

Technically, the git fetch command takes refspecs too. Before the invention of remotes and remote-tracking names, people sometimes needed (or at least very much wanted) to use refspecs with fetch. The invention of remote-tracking names got rid of most of the need, but the lack of remote-tracking names with push means we still need them with push.

Fortunately, remotes and refspecs were invented well before Git became widespread. The wrong default for git push, however, persisted through Git 1.7 and 1.8, and some people still use these versions. (Git 2.0 came out just about the same time as Git 1.9 and 1.9 does not seem to be in use.)

A lone branch name as a refspec has different meaning for fetch vs push, but since we generally don't put in refspecs when we run git fetch, we don't have to worry about that here.

4The receiver of a push operation stuffs incoming commits into a "quarantine" location in case they choose, in the end, to reject them. Older versions of Git lack the quarantine trick but it was pretty important for sites like GitHub.


Sometimes you want to send, but aren't allowed to git push

Here, you'll need to make a pull request. This is a GitHub feature: it's not part of Git. We won't cover it here; there are existing questions and answers for this. It's worth mentioning, though, that this makes use of the way a GitHub "fork" remembers the source repository. It does not use remote-tracking names, but rather a bunch of stuff that the GitHub folks invented.

Triangular work: you don't necessarily want to have a master branch

The purpose of a branch name, in Git, is to be able to locate a specific commit. Moreover, given a branch name, you can use that name with git checkout or git switch. This puts your local Git into a state in which you are, as git status will say, on that branch (on branch master or on branch develop or whatever). Once you're in this state, new commits you make will advance this branch name. You might have some series of commits ending at the one with hash H:

...--F--G--H   <-- new_branch (HEAD)

You do some work and run git commit and poof, you have a new commit I with parent H, and the branch name now locates the new commit:

...--F--G--H--I   <-- new_branch (HEAD)

The reason you have a master on laptop, though, is because you ran:

git clone <github url for the repository you call origin>

and when you did that, your Git called up the Git over at GitHub and copied the commits from that repository into a new repository. Then, your (laptop) Git created your own local branch name master, making it identify the same commit that your (local) Git calls origin/master as well:

...--F--G--H   <-- master (HEAD), origin/master

You then create your new feature branch, new_branch, pointing to commit H:

...--F--G--H   <-- master (HEAD), new_branch, origin/master

You check out new_branch so that HEAD attaches to that name:

...--F--G--H   <-- master, new_branch (HEAD), origin/master

and then you make your new commit I:

...--F--G--H   <-- master, origin/master
            \
             I   <-- new_branch (HEAD)

Notice how, all along, the remote-tracking name, origin/master, still identifies commit H.

Some time now passes. A bunch of new commits get added to the repository you forked. You run git remote add upstream url, where url is the URL for the repository over on GitHub that you forked. Then you run:

git fetch upstream

This has your Git call up their Git and get the new commits from them:

             J--K--...--T   <-- upstream/master
            /
...--F--G--H   <-- master, origin/master
            \
             I   <-- new_feature (HEAD)

Right now this is what you are doing (well, have already done):

  • First, git checkout master attaches your HEAD to your master.

  • Next, git merge upstream/master has Git figure out that it can move your branch name master forward to point directly to commit T (this is a tricky thing that git merge does; we haven't covered it here). The result looks like this:

                 J--K--...--T   <-- master (HEAD), upstream/master
                /
    ...--F--G--H   <-- origin/master
                \
                 I   <-- new_feature
    

    Note how no commits have changed. There are no new commits. All we did was move one label, so that branch name master now points to commit T.

  • Last, git push origin master ends up sending commits J-K-...-T to your GitHub fork, and then asks them to set their master to point to commit T. Since this is just an add-on (they still don't know about commit I but don't care because commits J through T simply add on), they accept the polite request, and your Git updates your origin/master.

The result after that last step is this, in your local (laptop) Git:

             J--K--...--T   <-- master (HEAD), origin/master, upstream/master
            /
...--F--G--H
            \
             I   <-- new_feature

But: Suppose we delete the name master entirely. We'll start with the same starting point as before, minus the one branch name:

...--F--G--H   <-- origin/master
            \
             I   <-- new_branch (HEAD)

We'll do git remote add if needed and then git fetch upstream if needed (we already did both, but let's just pretend we needed to) to get:

             J--K--...--T   <-- upstream/master
            /
...--F--G--H   <-- origin/master
            \
             I   <-- new_feature (HEAD)

Now, instead of any checkouts, we will just run this:

git push origin upstream/master:master

The name on the left locates the commit we want to send. This is commit T: the last commit on what we're calling upstream/master, which is what upstream's Git calls master.

The name on the right, master, is the name we're going to ask origin to set.

The same commits flow to origin as before. They now have the commits up to and including T (but they don't have I), just as before. Then we ask them to set their master to point to T, and they do, and we update our origin/master:

             J--K--...--T   <-- origin/master, upstream/master
            /
...--F--G--H
            \
             I   <-- new_feature (HEAD)

The end result is mostly the same. We did not have to git checkout master locally though, so we're still on new_feature.

Rebase

The one remaining problem we have is that our commit I has, as its parent, commit H. We cannot change this! Existing commit I exists, and has a hash ID. That commit is forever set in stone. But what we can do is have our Git compare the snapshot in commit I to that in commit H, to see what we changed, and make the same changes as a new commit, that comes after commit T.

In other words, Git makes a new commit by copying commit I. Let's call this new-and-improved I, I' (eye-prime), and draw it in:

                          I'  <-- ???
                         /
             J--K--...--T   <-- origin/master, upstream/master
            /
...--F--G--H
            \
             I   <-- new_feature (HEAD)

This process, of copying some commit, is a cherry-pick operation in Git, and you can do it with git cherry-pick. If you have several commits in a row to copy, though, the fast and easy way to do this is with git rebase. As a bonus feature, git rebase follows up the copying with a branch-name-move. Note how in the diagram above, we don't have a name by which to find our new copy I'. Rebase takes care of that by ripping the current branch name off the last commit we copy and making it point to that last copy. In this case, we can draw that as:

                          I'  <-- new_feature (HEAD)
                         /
             J--K--...--T   <-- origin/master, upstream/master
            /
...--F--G--H
            \
             I   [abandoned]

We do lose the easy way to find the hash ID of commit I, but that's because we want our Git to abandon the old commit(s) in favor of the new-and-improved ones.

That's the essence of rebasing. While I won't go into detail (because again this is so long), to achieve this kind of operation, we generally just run:

git rebase <target>

In this case the target is commit T, which we can find using the name upstream/master. If we've updated origin's master so that our origin/master locates commit T too, we can use git rebase origin/master, but git rebase upstream/master will work fine.

If we've kept our master branch name, and updated it, we can use git rebase master as well. The key is that we need to tell git rebase to locate commit T. Any name that finds commit T will be fine, here. In fact, we can even use T's hash ID, if we want to cut-and-paste it with the mouse for instance.

Now, if you ever ran git push origin new_branch before, you have already sent commit I to the Git over at origin. If you now attempt to send commit I' to origin and get the Git over at origin to point their name new_branch to commit I', they will say no!

The reason they say no is that this polite request is asking them to discard some original commit(s) in favor of their new-and-improved replacements. Your Git already did this, when you ran git rebase. But now you need to get their Git, over at origin, to do the same. This means you must use a force push operation.

You may have read that force-pushes are bad. They are. But we're faced with two bad alternatives: you can leave your pull request behind, or you can abandon the original commits in favor of these new-and-supposedly-improved replacements. If you abandon the originals, you need to convince everyone else who has copied these commits into their Git repositories to do the same.

Some people prefer to leave the pull request behind. There's nothing fundamentally wrong with that either. But if your company, or your personal preference, dictates that you should rebase, do it, and then use git push --force or git push --force-with-lease. (To find out the precise difference between these two, search StackOverflow.)