Without running git prune
or git gc
, will git push
upload any unreferenced objects?
Imagine these commits history:
A <= B <= C <= D <= E
where in commit C a new file was added, and that file deleted from commit D. Now a git rebase --onto B D
will result in:
A <= B <= E
and that file is still in .git/objects as it's referenced by the two detached commits C and D. Now what happens in these two events:
git push <remote> <branch>
will now remote contain the deleted because the file object still there?pull request to the main upstream that remote was forked from. If the answer to 1 is yes, will that file be merged to upstream if C and D were never merged with upstream?
edit: this question complements the case discussed here Removing unreferenced objects from remote
In general,
git push
won't push any unreferenced objects.There could be specific cases / optimizations where it might do so, because there's never been any explicit promise about this. But in practice, it doesn't.
Note that after your rebase, the local repository has a new (different hash ID) commit
E'
:When you run
git push <othergit> somebranch
to some other Git, the other Git presents its branch tip commit hash IDs to your Git, and your Git presents the hash ID of commitE'
to them. They obviously don't haveE'
yet since you just made it yourself, so they say they want it (or don't have it), and your Git presentsB
to them; if they don't have that, they'll take that commit as well, andA
as well if needed, and so on backwards through history.At some point, your Git reaches some commit that they do have, or runs out of commit hash IDs to send. Your two Gits now agree about what is to be sent, and—as a result of these negotations—your Git knows which commits they already have, and from that, which tree and blob objects they have as well (implied by them having, e.g., commit
A
and therefore all earlier commits as well).Your Git now—usually1—prepares a so-called thin pack. This is where you see the "counting objects" and "compressing objects" stuff. The thin pack contains only those objects they will need to reconstruct the commits you're sending: in our particular example, commits
E'
andB
, for instance. That includes tree and blob objects that they don't have—that aren't implied by the presence of commitA
—but not tree and blob objects that they do have.This is what makes the pack a "thin" pack: a thin pack is allowed to do delta-compression against missing objects. Let's say commit
A
has some file that is represented by a 10 megabyte blob object, and commitB
and/orE'
has some file that is not 100% identical, but shares 99% of that 10 megabyte object. The thin pack's new object can be delta-compressed, saying take 9.9 MB from object _____ (fill in the blank with a hash ID) and add these remaining 100 kB. A regular pack would have to include this "base object", but a thin pack doesn't.The receiving Git must:
The receiving Git now has all necessary objects for the new commits, either as loose objects or in a new fixed-up, no-longer-thin pack. Assuming the latter, this no-longer-thin pack is stored in that repository, so the new objects (plus perhaps some retrieved objects from other packs, if needed) are all in that repository now, in this now-regular pack.
(At some point it becomes profitable to repack the packs. This part gets quite complicated.)
1This depends on the protocol used to talk between your Git and their Git. The other option is to upload each object one at a time, which tends to be terribly wasteful in terms of bytes sent over the network, so people generally don't use the old protocols now.