Git format-patch/bundle for human-readable sneakernet "pull/push"

1k views Asked by At

I have two rooms in which I maintain some source code using git, a "dev" room where most development happens and a "deploy" room in which we actually use the software. Inevitably some changes happen in the deploy room as well. I'd like both rooms to share the same history in git.

Restrictions:

  1. For security reasons the two rooms are not network connected.
  2. Only text files (human readable) can leave the deploy room.

Moving changes into the deploy room is simple using git bundle, and tracking the last commit we moved into the deploy room. Moving changes out of the room is more difficult due to the text-only restriction.

Goal: Move commits back and forth between my two unconnected rooms as if a git pull had happened, i.e., identical SHA1 hashes in both rooms.

So Far:

  • I've tried git format-patch to move changes from deploy back to dev, but this doesn't record merges, and therefore requires a different patch set to be generated for each contiguous set of changes along with some record of how to reproduce the exact merge commit which happened in between. There is some discussion about making diffs for merge commits, but this doesn't seem to capture the actual ancestry, only the changes. It seems that patches may not be a rich enough format to provide the necessary information.

  • Some bundle-to-text script could be used to convert the bundle into non-zipped and human-readable(ish) format, (and then back again after downloading) but I have found no evidence that such a script exists.

  • Perhaps a script could be written to walk the history from some common ancestor to the newest commit and either a) make a patch or b) recreate the merge of some commonly-known refs.

Fallback: I could always squash the commits coming out of the deploy room into just one raw patch and break the history, but then further downloads from dev->deploy would break any existing working copies. Not ideal.

Update: I believe git fast-export may do what I need, although most examples have it working on entire repositories and not partial histories like git bundle. I have a working toy example in which I can export a partial history into an out-of-date clone, but it requires me to hand-edit the fast-export output so that I add a from <sha1> to the first commit. Without this modification the import creates different sha1s and then complains with Not updating refs/heads/master (new tip <hash> does not contain <master's hash>).

Update2: My git fast-export solution does work, but it has a bandwidth problem, since it works by providing entirely new files rather than diffs from previous files. This is not acceptable since I actually have to read all those extra lines.

1

There are 1 answers

3
L. Robison On BEST ANSWER

I never found the perfect solution, but what we do now seems to work. The main drawback is that commits within the deployment room initially have one SHA1 that then changes to a different SHA1 after being merged with the dev room. The good news is that git quite easily recognizes them as the same commits can can merge right through them.

  • There are 3 checkpoints we must keep track of:
    • dev/master which has the newest development in the dev room.
    • deploy/master which has the newest development in the deploy room.
    • dev_deploy_common which is the last commit the two histories share.

.

  1. When we move code from dev to deploy (using a bundle), we bring the commits in as part of the dev_deploy_common branch within the deploy room (git pull into dev_deploy_common), and then from deploy/master do a git merge dev_deploy_common and resolve and conflicts then and there.

  2. When we move code from deploy to dev (which must be a text file) we make a few extra steps:

  3. First we rebase deploy/master onto dev_deploy_common so that all of our patches are contiguous. This is generally easy since we've already handled any conflicts during the merges which occurred when bringing the bundle from dev to deploy.

  4. Second we generate a patch set using

    git format-patch -M25 -C25 --find-copies-harder -k --ignore-if-in-upstream
    

    The -M25 -C25 --find-copies-harder options just reduce the output text size. The -k option keeps commit subjects intact. The --ignore-if-in-upstream restricts our commits to just the new ones since dev_deploy_common.

    The result of this is a patchset.txt collection of patches. This file can be hand-reviewed and then moved to the dev room.

  5. In the dev room we import the patchset using the following command:

    git am -k -3 --keep-cr --committer-date-is-author-date patchset.txt
    

    Unfortunately, even though we use all the commands we can to keep the patch exactly the same, a few of the attributes change, primarily the committer. As a result the "same" commit will have different SHA1's in the dev and deploy rooms. This difference will persist until we move a bundle back into the deploy room.

  6. When moving a bundle from dev to deploy, the merge operation (typically) recognizes the identical patches and seamlessly replaces the commit with the one in the dev history. See Step 1.