Let's say I want to write a small helper that allows to append some metadata to a repository in a way that can propagate to clones via refs
. Simple example (a git-notes clone prototype that doesn't even attach the notes to any other git object):
hash=$(echo "Just a comment" | git hash-object -w --stdin)
git update-ref refs/comments/just $hash
i.e. I create a blob
with hash hash
and refer to that as refs/comments/just
so git fsck --unreachable
won't complain about it and git gc
will never prune
the object.
But that is of course a very simple example, in reality I'm interested in more complex features. And there, my question is, what can I "legally" do and what should I absolutely refrain from?
As an example, several posts on SE were about users having to recover from duplicate tree
entries. So one "don't" is therefore "don't create a tree
with duplicate entries". Another example is "do make sure your objects are reachable, so git prune
won't remove them". What else?
Can I create a custom object type? Use "invalid" filemodes for blobs in trees? Where can I find an overview? Or should I check git-fsck
's source manually to see what constitutes errors (and which ones are ignore-able)?
Dos:
Backup Your Repo: Before making significant alterations to a repository's internal structure, create a backup. I recommended before using
git bundle create /tmp/foo-all --all
.Use a Distinct Namespace: If you're introducing custom refs, try to use a distinct namespace (like
refs/comments/
in your example) to avoid any collisions with Git's conventional ref names.Ensure Object Reachability: your custom objects should always be reachable from some ref, to avoid accidental pruning by
git gc
or the more recentgit maintenance
.Test in a Separate Repo: Before applying your customizations to a primary or production repository, test in a separate or cloned repository to confirm your assumptions and ensure that there are no unexpected consequences.
Adhere to Object Types: Stick with the four primary object types (blob, tree, commit, and tag) for maximum compatibility. If you're trying to store custom data, it usually makes sense to store it as a blob and then reference it from a tag or commit.
Don'ts:
Avoid Duplicate Tree Entries: As you noted, tree objects should not contain duplicate entries. This can lead to unexpected behavior.
Don't Use Invalid Filemodes: While it might be tempting to use custom file modes for blobs in trees, it's likely to cause problems. Stick with the recognized modes detailed here (
040000
for subdirectory (tree),100644
for file (blob),100755
for executable, and120000
for a symbolic link).Avoid Creating Custom Object Types: Git recognizes four primary object types (blob, tree, commit, and tag). Introducing custom object types would likely break Git's internal mechanisms and tools that expect only these four.
Don't Modify Existing Objects: The integrity of Git relies on the immutability of objects. Once an object is created, it should never be changed. If changes are needed, create a new object and update the references accordingly.
Avoid Inconsistencies with SHA-1: The SHA-1 hash is integral for object identification and verification in Git. Any custom operation that might produce an inconsistency between the content of an object and its hash is a big no-no.
jthill asked in the comments:
Git notes are a way to append arbitrary metadata to objects without modifying the objects themselves. Typically, this means adding notes to commits. Notes are stored in their own refs, typically under
refs/notes/
, but they "attach" to another object (like a commit) by referring to that object's hash.A sideband branch is just a regular branch, but perhaps used for a purpose different from the primary branches. It has its own commit history and tree. It's stored under
refs/heads/
just like any other branch.Hence, jthill's recommendation: create a "
mynotes
" branch and usegit worktree
(that I presented here).That would make a separate workspace for this metadata, completely isolated from your main work.