git push and unreferenced objects

250 views Asked by At

Without running git prune or git gc, will git push upload any unreferenced objects? Imagine these commits history:

A <= B <= C <= D <= E

where in commit C a new file was added, and that file deleted from commit D. Now a git rebase --onto B D will result in:

A <= B <= E

and that file is still in .git/objects as it's referenced by the two detached commits C and D. Now what happens in these two events:

  1. git push <remote> <branch> will now remote contain the deleted because the file object still there?

  2. pull request to the main upstream that remote was forked from. If the answer to 1 is yes, will that file be merged to upstream if C and D were never merged with upstream?

edit: this question complements the case discussed here Removing unreferenced objects from remote

2

There are 2 answers

0
torek On BEST ANSWER

In general, git push won't push any unreferenced objects.

There could be specific cases / optimizations where it might do so, because there's never been any explicit promise about this. But in practice, it doesn't.

Note that after your rebase, the local repository has a new (different hash ID) commit E':

          C--D--E   [reflog / ORIG_HEAD access only]
         /
...--A--B
         \
          E'  <-- somebranch (HEAD)

When you run git push <othergit> somebranch to some other Git, the other Git presents its branch tip commit hash IDs to your Git, and your Git presents the hash ID of commit E' to them. They obviously don't have E' yet since you just made it yourself, so they say they want it (or don't have it), and your Git presents B to them; if they don't have that, they'll take that commit as well, and A as well if needed, and so on backwards through history.

At some point, your Git reaches some commit that they do have, or runs out of commit hash IDs to send. Your two Gits now agree about what is to be sent, and—as a result of these negotations—your Git knows which commits they already have, and from that, which tree and blob objects they have as well (implied by them having, e.g., commit A and therefore all earlier commits as well).

Your Git now—usually1—prepares a so-called thin pack. This is where you see the "counting objects" and "compressing objects" stuff. The thin pack contains only those objects they will need to reconstruct the commits you're sending: in our particular example, commits E' and B, for instance. That includes tree and blob objects that they don't have—that aren't implied by the presence of commit A—but not tree and blob objects that they do have.

This is what makes the pack a "thin" pack: a thin pack is allowed to do delta-compression against missing objects. Let's say commit A has some file that is represented by a 10 megabyte blob object, and commit B and/or E' has some file that is not 100% identical, but shares 99% of that 10 megabyte object. The thin pack's new object can be delta-compressed, saying take 9.9 MB from object _____ (fill in the blank with a hash ID) and add these remaining 100 kB. A regular pack would have to include this "base object", but a thin pack doesn't.

The receiving Git must:

  • take the incoming thin pack
  • inspect the incoming commits, and decide whether to accept them
  • if they're accepted, "fix" the thin pack or convert the objects to loose (unpacked) objects.

The receiving Git now has all necessary objects for the new commits, either as loose objects or in a new fixed-up, no-longer-thin pack. Assuming the latter, this no-longer-thin pack is stored in that repository, so the new objects (plus perhaps some retrieved objects from other packs, if needed) are all in that repository now, in this now-regular pack.

(At some point it becomes profitable to repack the packs. This part gets quite complicated.)


1This depends on the protocol used to talk between your Git and their Git. The other option is to upload each object one at a time, which tends to be terribly wasteful in terms of bytes sent over the network, so people generally don't use the old protocols now.

0
matt On

When you push a branch, only commits currently on that branch (ie reachable from the branch tip) are transferred to the remote.