i want to search for a byte pattern in many many files
my hex pattern is: BF 01 03 33 C0 F3 AB 5F 07
and this is a example of a binary file that contains such a pattern: https://file.io/fssCU8dteF75
...01 50 8B C3 8B CB 8B D3 8B EB 8B F3 8B FB FB CB 06 57 B9 80 00 8C D8 8E C0 BF 01 03 33 C0 F3 AB 5F 07 06 33 C0 8E C0 26 2B 3E 04 00 07 8B C7 D1 E8 D1 E8 D1 E8 D1 E8 8C C1 03 C8 8E C1 83 E7 0F...
using the version 13.0.0 of ripgrep from https://github.com/BurntSushi/ripgrep for windows (ripgrep-13.0.0-x86_64-pc-windows-msvc.zip)
i've tried several variants but im only able to find a single byte - not the whole pattern
with rg --binary -uuu "\xBF" finds the wanted files and other with BF inside
with rg --binary -uuu "\xBF\x01\0x03\0x033" find nothing
this rg --binary -uuu (?-u:\xBF\x01\x03\x33\xC0\xF3\xAB) seems to work but im unclear if there is some sort of unwanted uni-code interpretation happening with my search bytes
any idea how to get the hex/byte pattern matching working?
rg '\xBF'will search for the UTF-8 encoding of the Unicode codepointU+00BF, which corresponds to the byte sequence\xC2\xBF.Conversely,
rg '(?-u)\xBF'will search for the byte\xBFverbatim. Equivalently,rg '(?-u:\xBF)'or evenrg --no-unicode '\xBF'.Essentially, when Unicode mode is enabled, ripgrep treats all escape sequences as references to Unicode codepoints. This is consistent with the fact that, when Unicode mode is enabled, the fundamental atom of matching is the codepoint. Therefore, when Unicode mode is enabled, it is impossible to match invalid UTF-8.
When Unicode mode is disabled, the fundamental atom of matching is changed to a single byte. In this mode, you can match arbitrary byte sequences, including invalid UTF-8.
Also, if you're searching binary data, you might want to use the
-a/--textflag. That will force all binary data to be treat "as if" it were plain text. With that said, using--binarywill ensure that you won't miss any matches.Otherwise, the
--binaryflag inrg -uuu --binaryis strictly superfluous. Why? Because-uuuis equivalent to--no-ignore --hidden --binary.