Is there a way to logically mark a group of files with a label in Git? I understand how tags and Gitlab labels work but... in both cases those markers are applied to commits.
The applications we use, ETL, use a specific directory tree that is not quite flexible when it comes to identifying that 3 files belong to Solution X and 2 other files (in the same directory) belong to Solution Y.
Marking / labeling a subset of files instead of placing them in feature-specific folders or encoding in the naming convention... would make them easier to identify.
The files, BTW, are either XML (ETL jobs) or flat files (SQL/DDL).
How would you do this?
As others have said, there's nothing built in to do this. We might, however, note that Git stores four kinds of objects in the repository itself: blobs (files), trees (representing directories-full-of-files), commits (which form a directed acyclic graph through parent identifiers, with each commit carrying one tree object, one author, one committer, its set-of-parents, and an arbitrary log message), and annotated-tag objects (which have no strongly defined relationship, but every tag has exactly one target object, normally a commit).
Aside: how
git notes
workGit's notes are represented internally as commits. Each commit has, as its tree, a set of "files" whose names merely happen to be—entirely by accident :-) ... cough, ahem—exactly the same1 as some set of commits in the repository. When
git log
goes to display a commit whose ID is C, it "accidentally" checks to see ifrefs/notes/commits
exists, and if so, whether a file named C exists in the commit to whichrefs/notes/commits
points, and if so, it appends the contents of that file to the log message. So this is how notes attach to commits: one built-in part of Git checks to see if a special reference (refs/notes/commits
) points to a commit containing a tree containing a "file" (and it really is a file in the end, as it's an ordinary Git blob object) that should be tacked on to the commit log message.When you revise the set of notes, Git simply makes a new commit with a new tree. The new commit points back to the previous
refs/notes/commits
commit as its parent, so that the older notes remain in existence and can (with some difficulty) be viewed as they were in the past (this used to be very hard; I believe it has become easier). Git's natural pack-file compression handles these quite well, so that the space occupied by notes grows only linearly with new note additions.1For efficiency, the "name" gets modified somewhat, so that instead of the note file being named
deadbeefcafebabec0ffeedecadefadedbedcede
, for instance, it might be namedde/ad/beefcafe...
.Thus, the solution is obvious (ahem)
You want to represent a set of files arranged in a directory or series of directories. That is, of course, a tree, and Git has tree objects. Therefore you should create a tree object to hold one of these states.
You did not say whether you wish to keep multiple historical versions of this tree. If you do, the solution is obvious: handle them just as
git notes
does, using a new commit object to store each new tree, chaining the commits to make past versions retrievable. If not, it's up to you whether to create a commit object at all, as a single tag object could point directly to the tree object. (Some non-Git tools may have issues with tags to anything other than a commit or another tag. There are a few of these tags in the Git repository for Git itself, though.)In any case, you will also need one top-level reference to point to your commit-or-annotated-tag-object that points to the most current tree.
Creating the tree is easy: simply populate an index file—not the regular one, but an alternate, whose name you write into the environment variable
GIT_INDEX_FILE
—with the file names as you wish them to appear in your tree, bygit add
-ing the files (this will also put the necessary blobs into the repository if they are not there already), then invokegit write-tree
. This will turn the index into the desired tree object (after which you may, and probably should, discard the exportedGIT_INDEX_FILE
setting), printing the new object's ID to standard output as usual. Writing a commit and/or tag object to point to this tree is then merely a matter of invokinggit commit-tree
and/orgit mktag
(which will write their own new objects to the repository, printing the IDs to standard output as before). Last, usegit update-ref
to create or update a reference to point to the tag or commit object—and you have now re-implementedgit notes
, but in a form more suitable for your own desires.You can extract any saved tree any time, to any work-tree of your choice, with a simple
git checkout
using another temporary index.