Git: list case-sensitive paths that have collided during clone

976 views Asked by At

When cloning a git repository that contains case-sensitive file paths (e.g. /README.md and /readme.md) on a case-insensitive file system (like NTFS or APFS), git will only check out one of the colliding files.

In macOS (or Windows), how can I list all the files that collided because of case insensitivity?

2

There are 2 answers

0
torek On

There is no built in thing to find this. phd's comment will get you close, possibly close enough, but may over-fit a bit (although one might still like to know about these things).

For instance, suppose some commit has files:

path/TO/file1.ext
path/to/file2.ext

On your file system, only either path/TO or path/to may exist. Once one of those exists, these two files will be dropped into the same path/$to folder, where $to is either lowercase or uppercase. They will still be separate files, but will be called out by case-folding and sort-and-unique-dash-c-ing.

On macOS, we can also have collisions in paths due to Unicode normalization. Linux considers a file named 's' 'c' 'h' 'combining-umlaut' 'o' 'n' to be one file name, and a file named 's' 'c' 'h' 'o-with-umlaut' 'n' to be a second, different file name. The macOS default file systems will turn both names into a common form and claim that this is just one name. (I have no idea what Windows does with this.) A proper tool will should take this into account as well.

Note that Git will store each file separately in the index, and can update each separate index entry from a file-system-stored-file independent of the stored-file's path name. So we could have Git build a mapping from internal name to external name and make it handle these cases all automatically. But that's a pretty big task.

0
TTT On

I just had a scenario where I needed to find all files (or paths) at a commit that differed by case only. I ended up using this (in Bash):

diff \
  <(git ls-files | tr [:upper:] [:lower:] | sort) \
  <(git ls-files | tr [:upper:] [:lower:] | sort -u)

# and then to list the exact paths, for each file listed by the above diff:
git ls-files | grep -i <file-listed-in-above-diff>

Explanation: Within the diff command, the first command takes the output of ls-files and converts all uppercase to lowercase, and then sorts it. The second does the same but removes the duplicates. Diffing those two will output all of the dups, and then you can run the grep -i to see the exact paths of them all.

Side Note: ls-files only searches your currently checked out commit; it doesn't search through every branch in the repo. Normally you'll probably want to checkout the latest version of the default branch before running this.