Track files but exclude them from a git bundle

699 views Asked by At

I have a somewhat complex ansible workflow. I have two airgapped networks. I develop playbooks on both networks, so I have two somewhat independent ansible repositories managed by git. At the same time, most of the playbooks can be used in both places. To complicate matters, this is a one way transfer. I can transfer from network A to B, but not from B to A.

I have template files with information relevant to one network but not relevant on the other. I've designed it so that filenames should be the same (as well as variable names in Jinja2 templates). I want to be able to create a git bundle that excludes the files, so that when I pull from the bundle on the other network's repository, the files don't get overwritten. Because including the wrong information in the template files could conceivably break the entire environment, I need to track the Jinja2 template/variable files in Git.

Does anyone have a workflow recommendation, or a git command besides using .gitignore (because the files need to be tracked so I can roll back in emergencies) that will help me accomplish this?

1

There are 1 answers

1
torek On BEST ANSWER

There's no completely trivial way to do this.

Fundamentally, a file is tracked in Git if and only if it is in the index. The index is (normally, initially) populated from some commit, so that it is some previous commit that determines if a file is to be tracked. Assume there exist sets of commits T and U that are similar except that there are some files not in commits U that are in commits T. Then:

git checkout any-T-sub-i-commit

results in the file(s) being in the index (and hence tracked), while:

git checkout any-U-sub-j-commit

results in the file(s) being not-in the index (and hence untracked).

The same holds in a more general fashion for operations like merging: when you work with commits from set T, you work with the ones that have the files; when you work with commits from set U, you work with the ones that lack the files. If you merge any Ti commit with any Uj commit, the effect on any such file—whether it's added, removed, or conflicted—depends on whether the merge-base commit is in set T or set U, and the specific changes to those files within commit Ti with respect to the merge base commit.

Of course, as files move into or out of the index, Git also copies them into, or removes them from, the work-tree at the same time (with the usual caution about not removing unsaved-but-precious data). So this means that the work-tree file will vanish and reappear depending on whether you check out a T commit or a U commit.

Meanwhile, let's look at what a bundle is, at least in an abstract sense. The essence of a bundle is that it contains at least all the data that git fetch or git push would send across the wire, after the git fetch or git push communication process that serves to minimize this data. (It can contain extra data, which will simply be ignored.) This minimal data consists of all of the objects that must be copied—annotated tags, commits, trees, and blobs—plus the reference names and their values.

To exclude some set of files from the bundle, then, you need to bundle exclusively the U commits, and not any of the T commits. That's fine as far as it goes: if you have all branches duplicated, and distinguish between T commits and U commits by branch names, you can achieve this pretty easily. But the consequence is that every time you make a new T commit you must make a corresponding U commit, and vice versa. You have, in effect, doubled your workload.

The standard recommendation that applies to configuration files in general applies here as well: Do not commit any configuration, ever. Commit only sample or default or template configurations. Use some kind of wrapper to convert these sample configurations to real configurations. (The wrapper can also be committed, of course, if it's something you write yourself, such as a shell script or Python program or whatever.) You may now maintain, and version-control, these sample / default configurations. Cloning the repository obtains the samples, and updating from the clone—git fetch followed by a merge or rebase—updates the samples, but does not touch the actual configuration. Depending on how smart the wrapper is and what's available in your output format,1 it can even auto-detect that the sample/default input has changed, and warn or fail any runs that use the prescribed tool (i.e., the wrapper itself) until the real configuration is updated to match any required changes coming from the sample/default/template configuration.

This is still not trivial—in particular, you may have to write a wrapper, and educate users on the correct way to run your particular system. But it's as close to trivial as you are likely to achieve.


1In this particular case, your output is most likely the YAML files for ansible. This means you can hide all kinds of useful sample/default-config information in comments, for instance.