I recently tried to import a repository into GitHub (from Bitbucket) and the import was failing. GitHub tech support imformed me that they were seeing "bad date" issues in the repository and that I should run git fsck
on the repository. So I cloned it from BitBucket and ran git fsck
and this is what I get:
git fsck Checking object directories: 100% (256/256), done.
error in commit fda45b4b6b06f6b815341c1f26de827c769f48b6: badDate: invalid
author/committer line - bad date error in commit
636d259fd0ac343af2a5561ff799a54a6aeb9b1c: badDate: invalid
author/committer line - bad date error in commit
41dc786816992e3c42c904e8c848aa1078475386: badDate: invalid
author/committer line - bad date error in commit
c55a0fa0d98e02aa4621be202d7b7d21ed2ff2ab: badDate: invalid
author/committer line - bad date error in commit
e6ad8f5ea7cf6441b6ea6ab5583117113a8f49fb: badDate: invalid
author/committer line - bad date error in commit
4aea97fdd999484319a9fbbc4dc42b024e1eba80: badDate: invalid
author/committer line - bad date error in commit
531f7783e383868c1d52a1bf2dc3212f5e10a91c: badDate: invalid
author/committer line - bad date Checking objects: 100% (546/546),
done.
Well gosh, how did THAT happen? I have no idea how to even begin to fix that. Searching for "bad date" hasn't yielded any useful advice.
Would a kind git guru care to steer me in the right direction?
Someone used a bad version of Git (or a bad tool to build Git objects). Who, when, how, etc., are not possible to say, but if you examine the various bad commits, that will probably offer some very big clues, since the bad lines should have this general form:
or:
The date-and-time-stamp are the last two numeric fields. The stuff in between will likely tell you who to ask what Git version they were using.
Technically, you can't fix the bad commits themselves. The reason is that, bad or not, the raw data in the commit is the source of the hash ID of the commit. Since the hash ID is the real name of the commit, the real name of the commit requires that the commit's data be bad. If you did fix them, they would become different commits, which would have different hash-ID-names.
As VonC said, to produce a new, incompatible, but corrected repository, you must replace each of these bad commits with new-and-improved ones, perhaps using
git filter-branch
or the newgit filter-repo
. Whatever tool or method you use, you'll need to provide some way for replacing the badauthor
and/orcommitter
lines in the commit headers of the bad commits, with new correct lines—date-and-time stamps that meet Git's internal requirements.Having replaced the bad commits with corrected ones, you must now also replace every subsequent (descendant) commit, because the immediate children of these commits store their parent hash IDs (those of the bad commits) inside them as part of their data. So you must write up a new corrected child commit that preserves everything except the parent hash ID. That invalidates the child commit's children, so those too must be rewritten, and so on: all descendant commits.
This is just what these filter-branch / filter-repo tools do. You (somehow) pick out a bad commit in the repository, and they copy it to a new-and-improved commit instead. Then they copy all descendants of the original bad commit as well, so that there is a new family tree descending from the corrected commit.
Since the set of commits in a repository is the history in that repository, the result of copying all of these commits is a whole new history—a new repository, that all users of the old repository must now switch over to use. Hence the technical part of correcting the repository is usually the easiest part of this whole process. It takes some work to figure out what's wrong and how to use the tools to rewrite history, but you do that once and you're done. But then you must track down every user of the old repository and somehow convince them to stop using that one and start using the new and improved one instead.