how to search recursivly for hexpattern in files with ripgrep

313 views Asked by At

i want to search for a byte pattern in many many files

my hex pattern is: BF 01 03 33 C0 F3 AB 5F 07

and this is a example of a binary file that contains such a pattern: https://file.io/fssCU8dteF75

...01 50 8B C3 8B CB 8B D3 8B EB 8B F3 8B FB FB CB 06 57 B9 80 00 8C D8 8E C0 BF 01 03 33 C0 F3 AB 5F 07 06 33 C0 8E C0 26 2B 3E 04 00 07 8B C7 D1 E8 D1 E8 D1 E8 D1 E8 8C C1 03 C8 8E C1 83 E7 0F...

using the version 13.0.0 of ripgrep from https://github.com/BurntSushi/ripgrep for windows (ripgrep-13.0.0-x86_64-pc-windows-msvc.zip)

i've tried several variants but im only able to find a single byte - not the whole pattern

with rg --binary -uuu "\xBF" finds the wanted files and other with BF inside

with rg --binary -uuu "\xBF\x01\0x03\0x033" find nothing

this rg --binary -uuu (?-u:\xBF\x01\x03\x33\xC0\xF3\xAB) seems to work but im unclear if there is some sort of unwanted uni-code interpretation happening with my search bytes

any idea how to get the hex/byte pattern matching working?

1

There are 1 answers

0
BurntSushi5 On

rg '\xBF' will search for the UTF-8 encoding of the Unicode codepoint U+00BF, which corresponds to the byte sequence \xC2\xBF.

Conversely, rg '(?-u)\xBF' will search for the byte \xBF verbatim. Equivalently, rg '(?-u:\xBF)' or even rg --no-unicode '\xBF'.

Essentially, when Unicode mode is enabled, ripgrep treats all escape sequences as references to Unicode codepoints. This is consistent with the fact that, when Unicode mode is enabled, the fundamental atom of matching is the codepoint. Therefore, when Unicode mode is enabled, it is impossible to match invalid UTF-8.

When Unicode mode is disabled, the fundamental atom of matching is changed to a single byte. In this mode, you can match arbitrary byte sequences, including invalid UTF-8.

Also, if you're searching binary data, you might want to use the -a/--text flag. That will force all binary data to be treat "as if" it were plain text. With that said, using --binary will ensure that you won't miss any matches.

Otherwise, the --binary flag in rg -uuu --binary is strictly superfluous. Why? Because -uuu is equivalent to --no-ignore --hidden --binary.