Make zstd compressed files 'rsyncable' like gzip does with --rsyncable option

2k views Asked by At

Is there a way to make zstd compressed files 'rsyncable' like gzip does with --rsyncable option?

I've tried splitting input files into fixed length chunks and compressing them separately with no luck.

About the --rsyncable option:

When you synchronize a compressed file between two computers, this option allows rsync to transfer only files that were changed in the archive instead of the entire archive. Normally, after a change is made to any file in the archive, the compression algorithm can generate a new version of the archive that does not match the previous version of the archive. In this case, rsync transfers the entire new version of the archive to the remote computer. With this option, rsync can transfer only the changed files as well as a small amount of metadata that is required to update the archive structure in the area that was changed.

2

There are 2 answers

0
silinxey On BEST ANSWER

With version 1.3.8 zstd introduced --rsyncable mode.

1
ArtemGr On

I've tried splitting input files into fixed length chunks and compressing them separately with no luck.

This should work NP provided that you only change the bytes without moving them.

That is, if you split "The hog crawled under the high fence" into fixed-size chunks ["The hog ", "crawled ", "under th", "e high f", "ence"] and then independently compress them, then changing "hog" to "dog" will be rsync-friendly, because the compressed version of the remaining chunks, ["crawled ", "under th", "e high f", "ence"], will still be the same.

If, on the other hand, you move the bytes, like when you replace the "hog" with the "caterpillar", then splitting will no longer help, because the chunks ["The cat", "erpillar", " crawled", " under t", "he high ", "fence"] are now different and so also is different the compressed version of them.

Rsync will help with the former but not with the latter.

If you want arbitrary modifications, you'd need a smart chunk splitting alrogithm that gravitates towards certain points of the file. For example, if you split "The hog crawled under the high fence" on space, into "The ", "hog ", "crawled ", "under ", "the ", "high ", "fence", then replacing "hog" with "caterpillar" will only change one compressed chunk, allowing rsync not to transfer the rest of them.

P.S. Looks like LBFS uses such a chunk splitting scheme: "by sliding a 48 byte window over the file and computing the Rabin fingerprint of each window. When the low 13 bits of the fingerprint are zero LBFS calls those 48 bytes a breakpoint and ends the current block and begins a new one"