The scenario

I'm trying to remove some files from the entire history of a git repository. They all share a couple of common criteria:

  • They have "settings" in the file name. They may also have various prefixes and suffixes, though.
  • They will be two levels deep inside a certain directory in the file tree. The names of the second level directories vary. There are settings files deeper in the file tree that should not be removed.

Here, then, is an example of the file tree:

root-directory/
  |-> apples/
  |     |-> bad-settings-alpha.txt
  |     |-> bad-settings-beta.txt
  |
  |-> oranges/
  |     |-> bad-settings-gamma.txt
  |     |-> bad-settings-delta.txt
  |     |-> navels/
  |           |-> good-settings.txt
  |
  |-> good-settings.txt

I need to filter out all the bad-settings files while keeping the good-settings files.

My approach

So, using a tutorial provided by GitHub, in conjunction with the manpage for git-rm, I crafted this command (split across two lines):

git filter-branch -f --index-filter 'git rm --dry-run --cached \ 
--ignore-unmatch root-directory/*/*settings*.txt' --prune-empty -- --all

Of particular note here is the file glob I used: root-directory/*/*settings*.txt. If I use that file glob with ls, then I get exactly the list of files I want to remove. So, it should work, right?

Apparently not. If I run my command with that glob, it takes out all the settings files in levels deeper than two as well. In the file tree example above, that means that root-directory/oranges/navels/good-settings.php would get nuked.


I've tried to solve this on my own, trying variations on the file glob and using the wonderful --dry-run option for git-rm. Nothing seemed to work--all I could figure out how to do was change the file tree depth at which started I removing settings files.

I did find one thing that seemed extremely relevant to my problem. In the manpage for git-rm, there is this example:

git rm Documentation/\*.txt
  Removes all *.txt files from the index that are under the Documentation
  directory and any of its subdirectories.

  Note that the asterisk * is quoted from the shell in this example; this
  lets git, and not the shell, expand the pathnames of files and
  subdirectories under the Documentation/ directory.

"Removes all...files from the index that are under the...directory and any of its subdirectories" is consistent with what's actually happening. What's really interesting is the mention of the quoted asterisk. I understand that this lets git-rm handle the file glob expansion as opposed to bash. Okay. But that leaves these questions:

  • Why would I want to do that?
  • I'm not quoting my asterisks, so bash should be doing the expansion. If that's true, and my file glob works with ls, then why isn't it working with git-rm?

I've also seen the example directly beneath the one above, and it seems to do what I'm trying to do. And yet, that does not happen for me, or else I would not be here. It does seem to confirm that I want to do file expansion with bash, though.

1

There are 1 answers

4
Fatih Arslan On

Why not use find to show two level deep files:

find . -maxdepth 2 -mindepth 2 -type f -name "bad-settings*"

This will give your the list of bad-settings of only two level deep directores. You can them pipe them to git rm via xargs:

find . -maxdepth 2 -mindepth 2 -type f -name "bad-settings*" | xargs git rm