I am trying to identify large files in the history of my project that have been deleted. When I do this I can see an ordered list.
$ git rev-list --objects --all \
| git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' \
| sed -n 's/^blob //p' \
| sort --numeric-sort --key=2 \
| cut -c 42-
Usually, I can remove a file from history without issue like this:
git filter-repo --force --invert-paths --path <path-to-file>
However, it would appear that if a file had a different name or path in the past, once I delete the file, when I run the rev-list again I see the same file I just deleted, with the same size, but with a different path or name it had in the past.
Is there a way to see all file paths of objects with rev-list so that I don't have to iterate with path/name changes in the history like this?
A not very satisfying way would be to run
git ls-tree -r
on all commits in your repo, and grep the hash of the blob you are looking for :The answer to this question could help :
If you replace the
0|1
result of thecheck_tree
function with the list of paths at which the blob is found within the tree, the memoization of the function should also work just fine.Note that the
git filter-repo
command you run removes any version of the file located at<path-to-file>
from your history, not just the ones when that file's content matches exactly theblob
you identified.