Single-file history format/library for binary files?

162 views Asked by At

My application is going to edit a bunch of large files, completely unrelated to each other (belonging to different users), and I need to store checkpoints of the previous state of the files. Delta compression should work extremely well on this file format. I only need a linear history, not branches or merges.

There are low-level libraries that give part of the solution, for example xdelta3 sounds like a good binary diff/patch system.

RCS actually seems like a pretty close match to my problem, but doesn't handle binary files well.

git provides a complete solution to my problem, but is an enormous suite of programs, and its storage format is an entire directory.

Is there anything less complicated than git that would:

  • work on binary files
  • perform delta compression
  • let me commit new "newest" versions
  • let me recall old versions

Bonus points if it would:

  • have a single-file storage format
  • be available as a C, C++, or Python library

I can't even find the right combination of words to google for this category of program, so that would also be helpful.

2

There are 2 answers

0
gsl On BEST ANSWER

From RCS manual (1. Overview)

[RCS] can handle text as well as binary files, although functionality is reduced for the latter.

RCS seems a good option worth to try.

I work for a Foundation which has been using RCS to keep under version control tens of thousands of completely unrelated files (git or hg are not an option). Mostly text, but also some media files, which are binary in nature.

RCS does work quite well with binary files, only make sure not to use the Substitute mode options, to avoid inadvertently substituting binary bits that looks like $ Id.

To see if this could work for you, you could for example try with a Photoshop image, put it under version control with RCS. Then change a part, or add a layer, and commit the change. You could then verify how well RCS can manage binary files for you.

RCS has been serving us quite well. It is well maintained, reliable, predictable, and definitely worth a try.

0
Daniel Trugman On

Forgive me for asking, but my experience has taught me to challenge assumptions. I don't know why you need a 'single-file' solution, but my answer depends on that.

Option 1 - If you are simply looking for ease of use, have you considered using a single git repo to track multiple binaries?

By using git's per-file history capabilities, you can see the history for every file in the repo independently, create patches and rollback without affecting the rest of the repo. For example, by using a commit naming convention, you can easily rollback changes for individual files using:

git log -- filename
git revert <commit-id>

Option #2 - If you have a system constraint that forces you to store a single file, I would recommend considering git-bundle. Basically, that allows you to pack a git repo into a single file for easier storage/relocation (I guess that's pretty much as zipping your repo and storing the zipped file).

Option #3 - Consider Fossil. I haven't used it, so can't comment on it's qualities, but it looks like it might answer your requirements.