Remove unused assets from git history

1.9k views Asked by At

So, I've been searching all morning for the correct way to do this, and I'm just not that command line savvy to figure it out.

I have a git repo with a ton of assets in it. It's like the cardinal sin, I know.

The repo has grown to be too huge. I'd like to clean it up so I can programmatically remove all files that do not exist in HEAD anymore from the entire history of the repo. I've seen ways to do this where you can specify the file paths, but really, I am talking like 1000+ files that have been removed from our final product that I really don't care to have in my repo anymore.

UPDATE: I've cleaned the repo of all the assets that shouldn't have been there in the first place. I really just have source code in there now and a few assets that SHOULD be there. I'd really love to keep all the history of all the source code... so I'm really looking to scrap the deleted files from history while preserving the history what currently exists. That's the goal. I am pretty sure it can be done using git filter-branch - but I just don't understand it well enough.

3

There are 3 answers

1
Roberto Tyley On BEST ANSWER

Use the BFG Repo-Cleaner, a simpler, faster alternative to git-filter-branch specifically designed for removing unwanted files from Git history.

so I can programmatically remove all files that do not exist in HEAD anymore from the entire history of the repo

By default, the BFG 'protects' all files in your HEAD commit, but will delete other files that match your criteria.

You should carefully follow the usage instructions, but the core part is just this:

$ java -jar bfg.jar  --strip-blobs-bigger-than 1M  my-repo.git

Any files over 1MB in size - that aren't in your latest commit - will be removed from your Git repository's history. If you have normal, smaller-than-1MB, source files that you still want to remove, you can specify them with the --delete-files or --delete-folders options.

The BFG is typically at least 10-50x faster than running git-filter-branch, and generally easier to use.

Full disclosure: I'm the author of the BFG Repo-Cleaner.

0
T Percival On

You could make a shallow clone of the repository and make that the new "main" repository, with the old crufty one saved somewhere else.

git clone --depth=1 oldrepo newrepo

This way any files that were deleted are no longer reachable in the new clone, so they won't be stored as Git objects.

The downside of course is that this hides file change history, but it's still accessible in your original repo.

0
DRC On

backup your data first, this barely tested!

git filter-branch --tree-filter 'for i in $(git diff master --summary --diff-filter=A | grep "create mode" | cut -d " " -f 5-); do 
    rm "$i"
done' --prune-empty HEAD