I backup my CSS userstyles to a git repo like so:
❯ fd
stylus-2021-05-18.json
stylus-2021-05-20.json
These backup files are obviously mostly the same, i.e., stylus-2021-05-18.json
is the past history of stylus-2021-05-20.json
. How is this handled by git
?
Obviously, I could just rename the files to stylus.json
and let git
handle the version control completely, but I was wondering if git
is smart enough that it could work with these files automatically.
TL;DR
Commits are created as full file snapshots, always, but garbage collection creates commit packs, which efficiently stores similar blobs using diff compression, whether they're from the same file or not.
Intro
My understanding of Git storing "diffs" rather than full files was all wrong. After having done some readings and some experiments, I see that it doesn't matter if you modify a file or create a copy of a file, when you commit the change or the new file, Git creates a brand new blob, every time.
But, that's pretty inefficient, because you end up with a lot of different copies of the same text, with small diffs between blobs. That problem gets fixed when Git creates packs. I don't fully understand how Git searches for things to pack, but inside a pack, it will store some blobs as whole blobs, and some others as diffs from other blobs.
Experiment
At this point,
find .git -ls
shows me one big blob (3.5MB) storing this 6.9MB file.At this point,
find .git -ls
shows me two big blobs, each about 3.5MB. Seems pretty inefficient to me, but read on...Things don't get better:
find .git -ls
shows me three big blobs, each about 3.5MB!Now, at some point when you push, Git might pack your sandbox, but we can force that to happen right now: run
git gc
. That's not just garbage collection, as I incorrectly thought, it's also creating commit packs. After runninggit gc
,find .git -ls
now reports a single pack of about 3.2MB. So my three big blobs were identified as similar, better compressed, and stored efficiently. I think this is called "diff compression".References
Online posts I just read to answer this question: