git lfs prune to remove files from lfs and push to origin

22k views Asked by At

So here's what's happened:

  1. Accidentally committed lots of files that weren't meant to be.
  2. Did a git reset --soft HEAD~2 to get back to a commit before the accident
  3. Modified gitignore to ignore the files
  4. Commited again and pushed to origin.

I assumed the git reset would revers everything from the accidental commit, but after checking bitbucket's list of git lfs files, it seems all the lfs tracked files from the accidental commit were pushed to lfs in origin. These files do not exist if I look through the source in bitbucket.

So I tried doing git lfs prune which appeared to delete an amount of files that looks to be about the amount that was accidentally commited, then git lfs push origin master. Checked bitbucket's list of git lfs files again, but those files are still there and nothing's changed in origin.

What have I done wrong?

4

There are 4 answers

1
Chris On BEST ANSWER

There doesn't appear to be a standard way of doing this:

The Git LFS command-line client doesn't support pruning files from the server, so how you delete them depends on your hosting provider.

Bitbucket allows you to delete LFS files using its web UI (please read the entire linked page before proceeding):

Delete individual LFS files from your repository

It's important to understand that:

  • The delete operation described here is destructive – there's no way to recover the LFS files referenced by the deleted LFS pointer files (it's not like the git remove command!) – so you'll want to back up the LFS files first.
  • Deleting an LFS file only deletes it from the remote storage. All reference pointers stored in your Git repo will remain.
  • No branch, tag or revision will be able to reference the LFS files in future. If you attempt to check out a branch, tag or revision that includes a pointer file referencing a deleted LFS file, you'll get a download error and the check out will fail.

A repository admin can delete Git LFS files from a repo as follows:

  1. Go to the Settings page for the repo and click Git LFS to view the list of all LFS files in that repo.
  2. Delete the LFS files using the actions menu.

Surprisingly, the only way to remove LFS files from GitHub appears to be to delete and recreate the repository, losing issues, stars, forks, and possibly other data.

2
Mark Adelsberger On

In the initial steps you followed, I think you've just stumbled on one of the cases where git / git-lfs integration isn't always perfectly seamless.

The reset command would have moved your branch ref back. It would not have actually removed the unwanted commit (or related objects); but that normally wouldn't matter, because those objects are unreachable so would not be sent with a push. So far so good... with vanilla git.

BUT: The LFS objects (the real content of the large files) also weren't deleted prior to your push. AFAIK (and your experience seems to confirm this) LFS does not attempt to determine if LFS objects are reachable when pushing to the remote - which would, after all, seem to be an expensive check. Given that your LFS store is meant to house a bunch of large binary files, and that LFS is designed to mitigate the costs of having a large volume of unneeded data in the LFS store, the cost-benefit would usually favor just sending anything that's not on the server - which is what apparently happened here.

And unless you're facing a limit on physical storage on the server, that may be ok really. No fetch or pull - short of explicitly telling LFS to send you everything, which is not intended for normal usage - is going to cause those files to be downloaded anyway.

But maybe you're running into a storage limit with your repo host. Or maybe you just want them gone; I can't say I'd blame you. That deleting the files locally and pushing does not result in the files being removed from the server is, again, by design. (The same is true of core git objects; you can force-push a ref to make a remote object unreachable, but physically "cleaning up" the remote is independent of any local clean-up.)

Info on removing LFS files from a bitbucket-hosted repo can be found here: https://www.atlassian.com/git/tutorials/git-lfs#deleting-remote-files

0
Per On

For a large Unreal Engine project, we've opted to run two repositories, first an outer repository that's just a regular repository without LFS that will never be pruned.

We've then made a submodule out of the Content folder, which uses LFS. This is where Unreal Engine places all large assets.

We can then periodically reset just the repository of the Content folder and push the current state, while keeping full history of the rest of the project, especially the code.

In daily use, this means pushing first the submodule and then the outer module, which we've made a Bash commit script for. But it means that we get to simply not care how much data is put into the Content repository. The history for those assets rarely matters, so we just reset it maybe twice a year.

It's a little bit inconvenient to juggle two repositories. But this balances our needs of (a) staying with Git, (b) not using Perforce and a checkout model, (c) infinite history for the code, and (d) not stressing about how much data is being committed.

2
daniel.gindi On

For BitBucket users, I have a solution for this, that works for me for months already: https://gist.github.com/danielgindi/db0e0a897d8d920f23e155bb5d59e9c6

You basically open Chrome while in the bitbucket repo and logged in, and put that piece of code in the console. It uses your authorization to go and delete all LFS files older than the specified time, and it takes a few seconds.

Important note: Never run any piece of code in the browser blindly. Look at the code, make sure you understand what it does. I can tell you "trust me", but you don't know me.