removing files from git history that have changed name or path

296 views Asked by At

I am trying to identify large files in the history of my project that have been deleted. When I do this I can see an ordered list.

$ git rev-list --objects --all \
  | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' \
  | sed -n 's/^blob //p' \
  | sort --numeric-sort --key=2 \
  | cut -c 42-

Usually, I can remove a file from history without issue like this:

git filter-repo --force --invert-paths --path <path-to-file>

However, it would appear that if a file had a different name or path in the past, once I delete the file, when I run the rev-list again I see the same file I just deleted, with the same size, but with a different path or name it had in the past.

Is there a way to see all file paths of objects with rev-list so that I don't have to iterate with path/name changes in the history like this?

1

There are 1 answers

0
LeGEC On BEST ANSWER

A not very satisfying way would be to run git ls-tree -r on all commits in your repo, and grep the hash of the blob you are looking for :

git rev-list --all | xargs -L1 git ls-tree -r | grep "$hash"
# you can also replace 'xargs' with 'parallel'

# you can insert a command to avoid noisy repetitions of the same path :
... | awk '{ if (!seen[$4]++) { print $4 } }'

The answer to this question could help :

If you replace the 0|1 result of the check_tree function with the list of paths at which the blob is found within the tree, the memoization of the function should also work just fine.


Note that the git filter-repo command you run removes any version of the file located at <path-to-file> from your history, not just the ones when that file's content matches exactly the blob you identified.