git - fatal: your current branch appears to be broken (possibly from interrupted pull)

2.6k views Asked by At

Problem:

  1. accidentally did a git --amend and pushed it to a usb key from first computer
  2. pulled from the usb key to a second computer
  3. second computer repository is now corrupted
  4. git pull to first computer results in a merge conflict; confused about whether this would be a corrupted state as well (if the --amend is corrupt)

Symptoms:

most commands:

fatal: your current branch appears to be broken

.git/refs/heads/master:

$ cat .git/refs/heads/master

file conents of .git/refs/heads/master:

'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

$ git status
new file: ...
new file: ... [for every file in the repository; expected since on a corrupted branch]

I'm not sure I am willing to blame git --amend since it seems somewhat benign; maybe something else happened.

How to fix?:

Is fixing this as simple as taking the hash from the latest good commit of .git/logs/refs/heads/master and inserting it manually into .git/refs/heads/master ? How would I do this, if

If so, should I destroy the offending commit (so it doesn't corrupt things later on, with say a git-repack or something)?

When I tried to less .git/logs/refs/heads/master and took the [edit:typo] first hash from the last line of the form...

...
[hash for HEAD~2] [hash for HEAD~1] [authorname] ...
[hash for HEAD~1] [hash for HEAD] [authorname] ...
^^^^^^^^^^^^^^^^^   (corrupted)
EOF

... and then pasted into the .git/refs/heads/master file, I am now stuck with...

$ git fsck
error: inflate: data stream error (unknown compression method)
error: unable to unpack header of .git/objects/8f/1da374ffac3711f8cdde57379f90cb03bbb9ea
error: 8f1da374ffac3711f8cdde57379f90cb03bbb9ea: object corrupt or missing: .git/objects/8f/1da374ffac3711f8cdde57379f90cb03bbb9ea
error: inflate: data stream error (unknown compression method)
error: unable to unpack header of .git/objects/ac/2fcd052804fb7adac465220da5bcb04d008fc7
error: ac2fcd052804fb7adac465220da5bcb04d008fc7: object corrupt or missing: .git/objects/ac/2fcd052804fb7adac465220da5bcb04d008fc7
Checking object directories: 100% (256/256), done.
Checking objects: 100% (1147/1147), done.
error: inflate: data stream error (unknown compression method)
error: unable to unpack 8f1da374ffac3711f8cdde57379f90cb03bbb9ea header
error: inflate: data stream error (unknown compression method)
error: unable to unpack 8f1da374ffac3711f8cdde57379f90cb03bbb9ea header
fatal: loose object 8f1da374ffac3711f8cdde57379f90cb03bbb9ea (stored in .git/objects/8f/1da374ffac3711f8cdde57379f90cb03bbb9ea) is corrupt

I could try destroying the loose object, but I'm not sure if it is, in turn, a pointer to (e.g. a tree) of more corrupt objects that must be destroyed as well. I could certainly try to destroy this object if I knew the commend (or could try to rm it on a backup); should I try that?

Furthermore, how would I repair the USB key and other repo from this mess? Thanks.

(potentially useful references for newbies, unlikely to help people capable of answering this question: https://aboullaite.me/deep-dive-into-git-git-refs/ ) (similar question fatal: your current branch appears to be broken -- does not specify cause of error; many things may cause this error)


edit:

I did rm .git/objects/... for every object listed above, and now I have...

$ git fsck
Checking object directories: 100% (256/256), done.
Checking objects: 100% (1147/1147), done.
error: refs/remotes/origin/HEAD: invalid sha1 pointer 0000000000000000000000000000000000000000
error: refs/remotes/origin/master: invalid sha1 pointer 0000000000000000000000000000000000000000
error: HEAD: invalid reflog entry 8f1da374ffac3711f8cdde57379f90cb03bbb9ea
error: refs/heads/master: invalid reflog entry 8f1da374ffac3711f8cdde57379f90cb03bbb9ea
error: bad ref for .git/logs/refs/remotes/origin/HEAD
error: bad ref for .git/logs/refs/remotes/origin/master
error: ac2fcd052804fb7adac465220da5bcb04d008fc7: invalid sha1 pointer in cache-tree
broken link from    tree b0d598ef5427d59ed31eb1b315c761fc89af40b7
              to    tree ac2fcd052804fb7adac465220da5bcb04d008fc7
dangling blob f4e39c36cc8df3f9f324c0ccca4ed6a7a3ffe6ac
dangling tree 068716abcf815b4eaf8f0fe74c3020bf6251bba0
dangling blob fb4cfe7c94e8b4d800fdb4935806577b2b99fd94
dangling blob 35cf2ca2ed03811c14f1598c50daacfab9032b8f
missing tree ac2fcd052804fb7adac465220da5bcb04d008fc7
dangling blob d056e38af637cf0de76dac5689a8c5e735d75793
dangling blob 3b3903cc7b4eb035e9c4508024acc3f81c015741
dangling blob b09c3cc95935a327ecf7fad8374f14c4e320f67e
3

There are 3 answers

0
VonC On

Git 2.40 (Q1 2023) should be more robust in a similar scenario (interrupted pull).

First, don't forget to activate core.fsync=reference, as I mentioned here (Git 2.36+).

Git 2.40 fixes the sequence to fsync $GIT_DIR/packed-refs file that forgot to flush its output to the disk.

See commit ce54672 (20 Dec 2022) by Patrick Steinhardt (pks-t).
(Merged by Junio C Hamano -- gitster -- in commit 3ed91c5, 02 Jan 2023)

refs: fix corruption by not correctly syncing packed-refs to disk

Signed-off-by: Patrick Steinhardt

At GitLab we have recently received a report where a repository was left with a corrupted packed-refs file after the node hard-crashed even though core.fsync=reference was set.
This is something that in theory should not happen if we correctly did the atomic-rename dance to:

  1. Write the data into a temporary file.
  2. Synchronize the temporary file to disk.
  3. Rename the temporary file into place.

So if we crash in the middle of writing the packed-refs file we should only ever see either the old or the new state of the file.

And while we do the dance when writing the packed-refs file, there is indeed one gotcha: we use a FILE * stream to write the temporary file, but don't flush it before synchronizing it to disk.

As a consequence any data that is still buffered will not get synchronized and a crash of the machine may cause corruption.

Fix this bug by flushing the file stream before we fsync.

0
ninjagecko On

[answering own question for now, but I would like to accept an answer that actually works]

  • computer 1:
    • run git fsck on the first repo to conflict it was not corrupt
    • copy the repo folder to a backup on computer 1
    • commit anything on computer 1 (and/or maybe rebase if it's only you)
    • push to usb key (option: run git fsck on usb key beforehand; was not sure how to run on a bare repo so I cloned it to a temp repo...)
  • computer 2:
    • copy corrupt repo folder on computer 2 to a backup
    • git clone from usb key to restore repo

This is obviously a non-answer but works for now ("good enough"), but the correct way to solve this would probably benefit the community.

2
torek On

The root of the problem is most likely that you removed the USB key before or during the OS's action of writing to the USB key. This left a number of damaged files.

Files damaged by an improper shutdown of a computer for any reason—pulling a USB key is one, but so are operating system crashes, power failures, computers catching on fire, and so on—tend to result in damage to the most recently written files. Files that have just been sitting quietly tend to be intact. There are a lot of caveats here, but this general principal applies here.

  • Files containing branch information are likely to be damaged if you were updating branches.

    Here, the damaged files included .git/refs/heads/master. This file should contain 41 bytes, which consist of a 40 byte text representation of the hash ID of the commit that should be considered the tip of branch master, followed by an ASCII newline character. There is no defininite way to guess which commit is the correct tip commit, which is why Git stores the answer to the question "what is the hash ID of the tip commit of master* in some files.

    (In some cases a valid answer might be available in .git/packed-refs, but in general, if .git/refs/heads/master exists, it is supposed to hold the correct answer. The correct answer changes over time, by you creating new commits, by you running git reset, by you running git branch -f, and so on.)

  • The damaged files are likely to include .git/index, which holds Git's index or staging area. Both terms mean the same thing in Git, and there is a third now-rarely-used term, the cache, for this same file. Except while you are using the index to work on a conflicted merge, the stuff in the index file is mostly pretty easy to re-compute with some effort: Git uses it to make things go faster, i.e., as a cache, hence the third rarely-used term.

    If (as is mostly the case) the cache contains nothing that cannot be recomputed, you can simply remove .git/index and then run git reset to recompute it. That would make all the "new file ..." messages go away, with the only thing lost being whether you had staged some particular updates.

    The word mostly appears here often because the index contains the hash ID of all of the blob objects ("files") that you intend to put in your next commit. If you have done something unusual, such as create a new unique version of a file and put that into the index and then remove it from everywhere else, the hash ID of that unique version of the file may not be easy to find anywhere. You could get this when using git add -p or git reset -p to stage a third variant of a file that differs from both the committed version and the work-tree version. In this case, removing and re-creating the index (with rm .git/index; git reset) will make the easiest way to reconstruct that file be to re-run the git add -p or git reset -p operation.

    In your case, it seems possible that the index was erased entirely, leading to Git claiming that every file was new. This is the same thing you get if you run rm .git/index, so you might as well run git reset. However, there's definitely something else going on here because of this message:

    error: ac2fcd052804fb7adac465220da5bcb04d008fc7: invalid sha1 pointer in cache-tree
    

    The cache-tree here refers to stuff in the index, using its old name cache. But since this is "cache", removing and re-creating the index could get past the problem, depending on other things.

  • The damaged files are likely to include some of your work-tree files. Git cannot help you with these: it is the rest of your computer that manages these files. Git writes copies of files to your work-tree, when you git checkout some existing commit, so that they are in a form that you can see and work with, but after that, it's all yours / your-computer's to manage.

  • The damaged files are likely to include some of Git's internal objects. In your case, this apparently did happen:

    error: unable to unpack header of .git/objects/8f/1da374ffac3711f8cdde57379f90cb03bbb9ea
    error: 8f1da374ffac3711f8cdde57379f90cb03bbb9ea: object corrupt or missing: .git/objects/8f/1da374ffac3711f8cdde57379f90cb03bbb9ea
    

    This means object 8f1da374ffac3711f8cdde57379f90cb03bbb9ea is damaged. It's impossible to tell what type this object was, much less what data were in it and whether they were valuable, without further information.

  • The damaged files may include various reflogs, stored in .git/logs/. In this case you got:

    invalid reflog entry 8f1da374ffac3711f8cdde57379f90cb03bbb9ea
    

    and:

    error: refs/heads/master: invalid reflog entry 8f1da374ffac3711f8cdde57379f90cb03bbb9ea
    

    using this same number. That's the damaged object hash ID we saw just a moment ago. Since branch names are required to point to commit objects, we can now guess that 8f1da374ffac3711f8cdde57379f90cb03bbb9ea was a commit object, before it got damaged. It was probably a recently-created commit, such as the one made by the git commit --amend.

From all of this, we can conclude—but it's still a guess—that the git commit --amend commit itself got wrecked, and that the broken hash ID in master should have been the text 8f1da374ffac3711f8cdde57379f90cb03bbb9ea. The index was probably damaged or removed entirely, but often it's safe enough to just remove it and rebuild its cache aspect, so you can remove it again (if necessary) and then rebuild it from the current commit once you have a sensible "current commit". Some of your reflogs may be damaged, but reflogs are all auxiliary data anyway: nothing in them is critical to Git's own operation, so the damaged ones can be truncated. The biggest problem is the damaged .git/objects/ file.

If you're willing to lose that commit, it may be OK to simply remove the damaged object and put a valid hash ID into .git/refs/heads/master. When you did that, your fsck still complained:

error: ac2fcd052804fb7adac465220da5bcb04d008fc7: invalid sha1 pointer in cache-tree
broken link from    tree b0d598ef5427d59ed31eb1b315c761fc89af40b7
              to    tree ac2fcd052804fb7adac465220da5bcb04d008fc7

Now, that first line still refers to something read from .git/index. The remove-and-reset will rebuild the index/cache from the commit you selected by writing its hash ID into .git/refs/heads/master. If tree object ac2fcd052804fb7adac465220da5bcb04d008fc7 is not used anywhere else in the repository, that might leave you with an intact repository.

If it doesn't, though, there will only be two ways to make this particular Git repository self-consistent:

  1. Remove all commits—and all of their descendants if any—that refer, directly or indirectly, to that tree object.

  2. Obtain or reconstruct the missing object. If it is in some other Git repository—some clone of this one—that's the easy way to get it. Run git cat-file -p ac2fcd052804fb7adac465220da5bcb04d008fc7 in the repository that has it. The result is a text representation of the tree. Use git hash-object -t tree -w to create the object in the USB-key repository, so that it exists now.

Note that this obtain-or-reconstruct method works for any damaged or missing object: the hash ID is globally unique across every clone, so if you can find the object in some other clone, you can copy the object from that other clone.

Conclusion

how would I repair the USB key and other repo from this mess?

The clearest way is to use some other clone. Find an undamaged clone of the same repository. Make a clone of that clone, to be your new "repaired clone" result. To that new clone, add new objects from the damaged repository—new tags, trees, blobs, and commits—that are themselves undamaged. (You can copy them directly, with cp, or use git cat-file -p <hash> | git hash-object -t <type> --stdin.) This way you are only reading from the damaged clone.

When you hit a damaged object, recover as much useful data as possible, and move on.

The result will always be a good and valid clone with as much recovered data as possible and you will know exactly what you recovered and what you lost. You can even have commits in which some files are missing (because their internal blob object was damaged irrecoverably): you can save a note somewhere, perhaps using git notes or perhaps just on paper, about this and go back and reconstruct what you can later, for instance.

This method tends to be slow and painful. The method you used—attempt to repair the damaged clone in-place—is faster and easier, but may leave you with some hidden issues (e.g., lost commits that you just don't remember, that came after one that you deliberately clobbered because it was damaged).