How can I fix "bad date" issues in a git repository?

1.1k views Asked by At

I recently tried to import a repository into GitHub (from Bitbucket) and the import was failing. GitHub tech support imformed me that they were seeing "bad date" issues in the repository and that I should run git fsck on the repository. So I cloned it from BitBucket and ran git fsck and this is what I get:

git fsck Checking object directories: 100% (256/256), done. 
error in commit fda45b4b6b06f6b815341c1f26de827c769f48b6: badDate: invalid
 author/committer line - bad date error in commit
 636d259fd0ac343af2a5561ff799a54a6aeb9b1c: badDate: invalid
 author/committer line - bad date error in commit
 41dc786816992e3c42c904e8c848aa1078475386: badDate: invalid
 author/committer line - bad date error in commit
 c55a0fa0d98e02aa4621be202d7b7d21ed2ff2ab: badDate: invalid
 author/committer line - bad date error in commit
 e6ad8f5ea7cf6441b6ea6ab5583117113a8f49fb: badDate: invalid
 author/committer line - bad date error in commit
 4aea97fdd999484319a9fbbc4dc42b024e1eba80: badDate: invalid
 author/committer line - bad date error in commit
 531f7783e383868c1d52a1bf2dc3212f5e10a91c: badDate: invalid
 author/committer line - bad date Checking objects: 100% (546/546),
 done.

Well gosh, how did THAT happen? I have no idea how to even begin to fix that. Searching for "bad date" hasn't yielded any useful advice.

Would a kind git guru care to steer me in the right direction?

2

There are 2 answers

1
torek On

Well gosh, how did THAT happen?

Someone used a bad version of Git (or a bad tool to build Git objects). Who, when, how, etc., are not possible to say, but if you examine the various bad commits, that will probably offer some very big clues, since the bad lines should have this general form:

author A U Thor <[email protected]> 1575578639 -0800

or:

committer A U Thor <[email protected]> 1575578639 -0800

The date-and-time-stamp are the last two numeric fields. The stuff in between will likely tell you who to ask what Git version they were using.

I have no idea how to even begin to fix that.

Technically, you can't fix the bad commits themselves. The reason is that, bad or not, the raw data in the commit is the source of the hash ID of the commit. Since the hash ID is the real name of the commit, the real name of the commit requires that the commit's data be bad. If you did fix them, they would become different commits, which would have different hash-ID-names.

As VonC said, to produce a new, incompatible, but corrected repository, you must replace each of these bad commits with new-and-improved ones, perhaps using git filter-branch or the new git filter-repo. Whatever tool or method you use, you'll need to provide some way for replacing the bad author and/or committer lines in the commit headers of the bad commits, with new correct lines—date-and-time stamps that meet Git's internal requirements.

Having replaced the bad commits with corrected ones, you must now also replace every subsequent (descendant) commit, because the immediate children of these commits store their parent hash IDs (those of the bad commits) inside them as part of their data. So you must write up a new corrected child commit that preserves everything except the parent hash ID. That invalidates the child commit's children, so those too must be rewritten, and so on: all descendant commits.

This is just what these filter-branch / filter-repo tools do. You (somehow) pick out a bad commit in the repository, and they copy it to a new-and-improved commit instead. Then they copy all descendants of the original bad commit as well, so that there is a new family tree descending from the corrected commit.

Since the set of commits in a repository is the history in that repository, the result of copying all of these commits is a whole new history—a new repository, that all users of the old repository must now switch over to use. Hence the technical part of correcting the repository is usually the easiest part of this whole process. It takes some work to figure out what's wrong and how to use the tools to rewrite history, but you do that once and you're done. But then you must track down every user of the old repository and somehow convince them to stop using that one and start using the new and improved one instead.

0
VonC On

As illustrated here, a git cat-file would help to see the format for author/commiter

Example:

'git log' doesn't show the commit object's headers as-is, but 'git cat-file' does:

  $ git cat-file -p c609265ccce27a902b850f5d62e6628710ee2928
  tree ea8e3bd64e67d159e706b8feccfc169f6ac0829d
  parent db4e80ee11a4e212a97efc1761ed237c7da72cb1
  author Richard W.M. Jones <rjones@xxxxxxxxxx> <"Richard W.M. Jones <rjones@xxxxxxxxxx>"> 1254739168 +0100
  committer Richard W.M. Jones <rjones@xxxxxxxxxx> <"Richard W.M. Jones <rjones@xxxxxxxxxx>"> 1254739168 +0100

  New translations.

Since git filter-branch is declared deprecated, consider newren/git-filter-repo to fix those authors/committers, as in here.

But: the history will be rewritten, which might be OK in your case (migration).