I have a project called geoplot
that does geospatial plotting in Python. The code for it is distributed via git
on GitHub. You can check it out here.
As a part of the development process for this package, I uploaded and stored in the geoplot
repo a folder called data/
which contained a large number of data files in various formats. These data files were used to populate the examples in the complimentary example gallery.
However, these files bloat the overall repository size way up to ~150 MiB (issue). This is clearly way too much, and it's time for me to get rid of them.
The problem is that I need to not just remove these files from the current HEAD
, I also scrub these files out of the entire git
history. I tried a manual approach using git rebase
that didn't work. Then I tried the BFG Repo-Cleaner
tool, as recommended in the canonical SO question on the matter.
BFG rid me of the files alright—they no longer exist anywhere in the history. However, the size of the repo (as seen when running https://github.com/ResidentMario/geoplot.git
) did not go down at all!
Here is what I tried (minus printouts):
java -jar ../bfg-1.12.15.jar --delete-folders "data" .
git reflog expire --expire=now --all && git gc --prune=now --aggressive
git push --set-upstream https://github.com/ResidentMario/geoplot.git master --force
The full printout is in an issue on GitHub.
What, if anything, did I do wrong? How do I diagnose the source of and expunge this wasted space?
I did mention
reflog
andgc
back in 2010, but also removing old objects.(Note:
gc
should be followed by arepack
)First, check if by cloning your repo again, you still have the same size.
As the OP Aleksey Bilogur mentions in the comments:
you need make sure your tag are not referencing the old data, and then you need to force-push all the tags and branches as well (not just
master
)generated data must be removed from the repo history.