I have a Git repo that is using LFS Storage. The purpose of the repo is to store and track version history of DOCX files on a single (master) branch. I initially started using LFS to reduce the storage space needed on local copies of the repo. Sometimes it is useful for me to check out old versions of files to investigate certain issues, but I only need a max of 6 months history.
Recently my LFS storage is approaching the 50GB limit for my purchased tier. To cut down on space, I would like to truncate the history of my repo so that everything older than 6 months is cleared. I have a local backup just in case.
What I've tried
I've looked into using some git commands (git prune
, git filter-branch
) as well as BFG Repo-Cleaner, but it doesn't seem like any of those accomplish my goal. I can provide more details of what I've tried upon request but I want to avoid this question getting too long.
Git Clone with Depth Parameter
My latest idea was to clone my repo with a depth parameter so that only the last 6 months of history is preserved. Then force push the local copy of my repo to overwrite what is on the remote.
However, when I tried force pushing, it told me Everything up-to-date
, which I guess makes sense because the latest commit in my local repo still matches the latest commit on the remote. I need a way to force push the entire repository history.
Clear LFS and Recreate from Local Repo
Now I'm thinking maybe I should clear everything from the remote LFS repo and use my truncated local repo (cloned with a depth parameter) to recreate the remote. However, I'm worried that I'll lose all my historical files (everything except the latest version) if I clear LFS.
Do I understand LFS?
The way I understood LFS is that only the latest versions of files are downloaded locally, and all historical files are represented by pointers. When you check out older commits, the local pointers are used to grab the full versions of those older files from remote LFS storage. However, I did a test where I cloned my repo, then disconnected from the internet and tried checking out an old commit. Without an internet connection, I was able to check out the commit and view old versions of my files. Which means my local repo already contained the full versions of those older files. So now I'm doubting my understanding of LFS entirely.
What makes it more confusing is that my remote LFS storage is at 41GB, but when I clone my repo locally it is only 5GB. Why is it so much smaller if I have the full history in my local repo? Is it because git is compressing older files, and they get uncompressed when I check out an older commit?
Questions
- Have I a correct understanding of how LFS works?
- What is the best way to truncate my remote LFS history in order to clear up storage space?
Considering my ignorance on this subject, it's entirely possible that LFS isn't the right option for my use case. However, I have a lot of infrastructure set up around this current method, so I'm seeking responses that address my issue with LFS, rather than suggesting different storage options.
Edit
I now have a method for getting the result I want, but only locally. Now I need to know how to essentially replace the full history of the remote repository with what I have locally:
- Get the number of commits in the last 6 months:
git rev-list --count --since="6 months ago" HEAD
- Clone the template with a depth parameter, using the commit count from the previous step:
git clone --depth <number of commits> <url>
- Fetch full object history:
git lfs fetch --all
- Pull (I don't think this is necessary but doing it just in case):
git lfs pull
How would I go about replacing my entire remote repository with the contents of my entire local repository? It's important that the repository itself is maintained because there is a deployment key associated with it that is installed in many different places and would be a major pain to update.