Storing large files in git without copying

2.4k views Asked by At

I'm looking for a way to store large files efficiently in git without having multiple local copies.

I've tried git lfs, however it creates a copy in .git/lfs/objects/ of every modified file on every commit. This means that I need at least twice as much disk space (if I use git lfs prune regularly).

I understand that this is to keep the git workflow, but my data is in the hundreds of GBs and this is not really usable.

1

There are 1 answers

5
LightBender On

Git is designed and optimized for storing text files that can be versioned over time. Because of this, it has historically come up rather short when dealing with large binary files. While git LFS is a great way to integrate the storage of large binaries into a git workflow, it is still not what git was designed for.

By the nature of how LFS in implemented, keeping local copies of large files is impossible to avoid. Fundamentally it's just a mechanism for connecting git directly to a binary file archive.

If you're dealing in binaries to the tune of hundreds of gigabytes, git is probably not the right tool for your needs. (Well, maybe for any text files stored in your projects.) But you may be trying to ram a square peg into a round hole.

As an architect buddy of mine says, "When all you have is a hammer, everything starts to look like a screw."

I deal almost exclusively in source code, so I can't really make a concrete recommendation beyond saying check out document management systems designed for media artifacts.

If you're certain you want to stick with git, you might be able to build a mechanism similar to a package manager to pull down the artifacts you require on demand based using a configuration and script you can store in your repo.