ripgrep to find files which contain multiple words, possibly on different lines?

3.5k views Asked by At

If i have files like:

cat file1.txt
foo
bar
cat file2.txt
foo
baz
cat file3.txt
bar
baz

Is there a command on ripgrep (or similar) that will search for e.g. files containing foo and bar? E.g. it will display file1.txt but not the other two files? (note foo and bar might not be on the same line.)

And then second question, to get even more fancy, can I use some syntax to count files with foo but exclude them if they also contain bar? So e.g. it would only display file2.txt?

Thanks!

2

There are 2 answers

0
Chris Stryczynski On

Because rg can be passed a list of files to search... You can just create a second search on the results of the first:

rg "foo" $(rg "bar" -l)

This searches for files that have both "bar" and "foo" in them.

1
Bill Wadley On

RipGrep uses the Rust RegEx crate by default. It is quite good, but it is still missing a couple of features, namely look-ahead and back references. RipGrep also supports PCRE2 which does have look-ahead and back references, but it must be specifically compiled in (Rust must be installed.)

$ cargo build --release --features 'pcre2'

Regarding searching in files for text that may be separated by newlines, default ripgrep provides a multiline option:

$ rg multiline 'foo.*bar' # can be shortened to -U

However, the . character type matches anything except newlines, so they have to be matched specifically:

$ rg -U 'foo.*[\n\r]*.*bar' *.txt # troublesome...

This becomes problematic with multiple lines involved, so another technique is to use an option to tell the . to also match newlines:

$ rg -U --multiline-dotall 'foo.*bar' *.txt

or use an option setting to tell the . to also match newlines:

$ rg -U '(?s)foo.*bar' *.txt

Result:

$ echo -e 'foo\nbar' > file1.txt
$ echo -e 'foo\nbaz' > file2.txt
$ echo -e 'bar\nbaz' > file3.txt

$ rg -U '(?s)foo.*bar' file*.txt
file1.txt
1:foo
2:bar

In order to find all files with 'foo' but not also 'bar' afterwards, it will be necessary to use look-ahead, or, more specifically, negative look-ahead. Only RipGrep with PCRE2 support compiled in will work:

$ rg -U --pcre2 '(?s)foo.*^(?!.*bar)'
file2.txt
1:foo