Git Remove Large Files from History

3.2k views Asked by At

I have a huge git repository (810mb) with large files that should not be there: complete JRE archives for distribution, located in the folder build/java.

I am trying to remove those files, so I ran:

 git filter-branch --tree-filter 'rm -rf build/java' HEAD

I now see the message: Your branch and 'origin/develop' have diverged, and have 414 and 414 different commits each, respectively. (use "git pull" to merge the remote branch into yours)

I don't want to run git pull, but before I push to the remote repository on github I want to see that the repository has shrunk.

Unfortunately, I still see it as 810mb.

What am I doing wrong? How can I shrink that repository?

TIA!

2

There are 2 answers

0
Sam Varshavchik On

Execute

git reflog

To see a history of all commits you were at, at the top of your branch, for the last 30 days (the default retention period). Even though you rebased your branch, the commits on your old branch are still in git's reflog history, and this prevents their parent commits from being purged, together with any files they reference.

So, if some of the unwanted files are still anywhere in the history of any of those archived commits, this will effectively prevent git from purging the commits with the unwanted files.

In order to make sure that you've purged those files from the repository you must:

1) Delete your entire reflog history

git reflog expire --all

2) Figure out if any tag or branch still has any of the unwanted files in its history, and figure out what to do about it. Either delete the branch/tag, or also filter them out.

3) Run git gc to do garbage collection.

This should finally remove all the dropped files from your local git repository.

Here's the bad news: when you finally push the clean branch, pretty sure this won't guarantee that the unwanted files will also get dropped from your github repo. All you're doing is pushing the commits in your branch out. This won't, necessarily, cause the remote git repo to get garbage-collected. I am not familiar with github's default configuration, when it comes to garbage-collecting their repos. You will have to investigate that.

0
Philippe On

First, I highly recommend to use 'bfg repo cleaner' to remove big files from your repository.

Second, as you use github, you should know that you can use a new feature to handle some type of files that can be huge: git lfs

Unfortunately, I still see it as 810mb

Indeed, when you use filter-branch, git create a saved of all updated references during the operation under the prefix original. Until you have not accepted your changes by deleting these references AND done a garbage collection, all the objects are still in the git 'database' and the size stay the same!