we are using git as our DVCS for a very large project (yes, I know git it's not always pointed at as the best for these situations), and there's something I don't quite understand about my repo.
This is my count-objects output:
count: 53
size: 1.57 MiB
in-pack: 26444
packs: 2
size-pack: 42.49 GiB
prune-packable: 0
garbage: 8
size-garbage: 32.22 GiB
as you can see the size is less than 2Mb, the size-pack is 43Gb (what is this, exactly?), but the size-garbage is 32Gb! What is that? Can I remove it? How?
I tried many options found on the internet with very poor understanding of what they do on a separate repository with basically no gains or major changes. Like:
git reflog expire --all --expire=now
git gc --prune=now --aggressive
git gc
git repack -a -d --depth=250 --window=250
You typically don't want to use this as an end user, but it can be useful if you introduced a massive amount of data ended up pulling it out of the branch (as in resetting back to a previous commit, and generating a new commit without the data). In this case, you'd still have an active reference to the data in the reflog, and git would keep the objects around until they expired.
Avoid the
--aggressive
option. It does way too much work for far too little gain, in my experience. However, the variantgit gc --prune=now
is probably what you want. This will remove any unreferenced data. Generally, Git keeps unreferenced data around for a couple of weeks, but that can add up pretty quickly. Also, it will repack the repository, which can save quite a bit of space as compared to loose objects (loose objects are not compressed).This form of
git gc
can be useful too. It will still prune, but using the default time frame instead. Anything that is younger than the default will be left as a loose object, so if you're intent on improving disk space, I would not recommend this form.This one has been out there a while, but generally applied wrongly. This is only really helpful if you've converted your repository from another version control system, and the conversion process didn't choose good parents for packing. It's also really expensive to compute and will take a long time to finish. In repositories that were converted with poor delta choices, it can make a huge difference. But it's not likely what you are looking for.
size
refers to the loose objects on disk that are currently referenced. This could be through the reflog, branches, remotes, etc. Thesize-garbage
field is the interesting one. This is data that is currently unreferenced and available for garbage collection.git gc
would remove it right now, as long as it's more than 2 weeks old.git gc --prune=now
would remove it despite it's age.So, in the end, it's likely
git gc --prune=now
that you want to run on the repository.