git garbage-size out of control, need understanding

2.1k views Asked by At

we are using git as our DVCS for a very large project (yes, I know git it's not always pointed at as the best for these situations), and there's something I don't quite understand about my repo.

This is my count-objects output:

count: 53
size: 1.57 MiB
in-pack: 26444
packs: 2
size-pack: 42.49 GiB
prune-packable: 0
garbage: 8
size-garbage: 32.22 GiB

as you can see the size is less than 2Mb, the size-pack is 43Gb (what is this, exactly?), but the size-garbage is 32Gb! What is that? Can I remove it? How?

I tried many options found on the internet with very poor understanding of what they do on a separate repository with basically no gains or major changes. Like:

git reflog expire --all --expire=now
git gc --prune=now --aggressive
git gc
git repack -a -d --depth=250 --window=250
1

There are 1 answers

7
John Szakmeister On
git reflog expire --all --expire=now

You typically don't want to use this as an end user, but it can be useful if you introduced a massive amount of data ended up pulling it out of the branch (as in resetting back to a previous commit, and generating a new commit without the data). In this case, you'd still have an active reference to the data in the reflog, and git would keep the objects around until they expired.

git gc --prune=now --aggressive

Avoid the --aggressive option. It does way too much work for far too little gain, in my experience. However, the variant git gc --prune=now is probably what you want. This will remove any unreferenced data. Generally, Git keeps unreferenced data around for a couple of weeks, but that can add up pretty quickly. Also, it will repack the repository, which can save quite a bit of space as compared to loose objects (loose objects are not compressed).

git gc

This form of git gc can be useful too. It will still prune, but using the default time frame instead. Anything that is younger than the default will be left as a loose object, so if you're intent on improving disk space, I would not recommend this form.

git repack -a -d --depth=250 --window=250

This one has been out there a while, but generally applied wrongly. This is only really helpful if you've converted your repository from another version control system, and the conversion process didn't choose good parents for packing. It's also really expensive to compute and will take a long time to finish. In repositories that were converted with poor delta choices, it can make a huge difference. But it's not likely what you are looking for.

count: 53
size: 1.57 MiB
in-pack: 26444
packs: 2
size-pack: 42.49 GiB
prune-packable: 0
garbage: 8
size-garbage: 32.22 GiB

as you can see the size is less than 2Mb, the size-pack is 43Gb (what is this, exactly?), but the size-garbage is 32Gb! What is that? Can I remove it? How?

size refers to the loose objects on disk that are currently referenced. This could be through the reflog, branches, remotes, etc. The size-garbage field is the interesting one. This is data that is currently unreferenced and available for garbage collection. git gc would remove it right now, as long as it's more than 2 weeks old. git gc --prune=now would remove it despite it's age.

So, in the end, it's likely git gc --prune=now that you want to run on the repository.