What are the dos and don'ts of custom objects and refs?

877 views Asked by At

Let's say I want to write a small helper that allows to append some metadata to a repository in a way that can propagate to clones via refs. Simple example (a clone prototype that doesn't even attach the notes to any other git object):

hash=$(echo "Just a comment" | git hash-object -w --stdin)
git update-ref refs/comments/just $hash

i.e. I create a blob with hash hash and refer to that as refs/comments/just so git fsck --unreachable won't complain about it and git gc will never prune the object.

But that is of course a very simple example, in reality I'm interested in more complex features. And there, my question is, what can I "legally" do and what should I absolutely refrain from?

As an example, several posts on SE were about users having to recover from duplicate tree entries. So one "don't" is therefore "don't create a tree with duplicate entries". Another example is "do make sure your objects are reachable, so git prune won't remove them". What else?

Can I create a custom object type? Use "invalid" filemodes for blobs in trees? Where can I find an overview? Or should I check git-fsck's source manually to see what constitutes errors (and which ones are ignore-able)?

1

There are 1 answers

0
VonC On BEST ANSWER

dos and don'ts of custom objects and refs?

Dos:

  • Backup Your Repo: Before making significant alterations to a repository's internal structure, create a backup. I recommended before using git bundle create /tmp/foo-all --all.

  • Use a Distinct Namespace: If you're introducing custom refs, try to use a distinct namespace (like refs/comments/ in your example) to avoid any collisions with Git's conventional ref names.

  • Ensure Object Reachability: your custom objects should always be reachable from some ref, to avoid accidental pruning by git gc or the more recent git maintenance.

  • Test in a Separate Repo: Before applying your customizations to a primary or production repository, test in a separate or cloned repository to confirm your assumptions and ensure that there are no unexpected consequences.

  • Adhere to Object Types: Stick with the four primary object types (blob, tree, commit, and tag) for maximum compatibility. If you're trying to store custom data, it usually makes sense to store it as a blob and then reference it from a tag or commit.

Don'ts:

  • Avoid Duplicate Tree Entries: As you noted, tree objects should not contain duplicate entries. This can lead to unexpected behavior.

  • Don't Use Invalid Filemodes: While it might be tempting to use custom file modes for blobs in trees, it's likely to cause problems. Stick with the recognized modes detailed here (040000 for subdirectory (tree), 100644 for file (blob), 100755 for executable, and 120000 for a symbolic link).

  • Avoid Creating Custom Object Types: Git recognizes four primary object types (blob, tree, commit, and tag). Introducing custom object types would likely break Git's internal mechanisms and tools that expect only these four.

  • Don't Modify Existing Objects: The integrity of Git relies on the immutability of objects. Once an object is created, it should never be changed. If changes are needed, create a new object and update the references accordingly.

  • Avoid Inconsistencies with SHA-1: The SHA-1 hash is integral for object identification and verification in Git. Any custom operation that might produce an inconsistency between the content of an object and its hash is a big no-no.


jthill asked in the comments:

what specific difference is there between "notes that doesn't even attach the notes to any other git object" and just an ordinary sideband branch (with its own root)?

  • Git notes are a way to append arbitrary metadata to objects without modifying the objects themselves. Typically, this means adding notes to commits. Notes are stored in their own refs, typically under refs/notes/, but they "attach" to another object (like a commit) by referring to that object's hash.

  • A sideband branch is just a regular branch, but perhaps used for a purpose different from the primary branches. It has its own commit history and tree. It's stored under refs/heads/ just like any other branch.

Hence, jthill's recommendation: create a "mynotes" branch and use git worktree (that I presented here).
That would make a separate workspace for this metadata, completely isolated from your main work.