There exist three identical txt
-files in one folder and each file has only one word inside: "hello". First file is encoded in UTF-8
, second one in UTF-16
and the last in UTF-32
(all files created on linux). But using grep
grep -i "hello" *.txt
returns only one result, it's the UTF-8
-file. Grep does not find the other two files.
How can I grep a folder that partially contains UTF-16
or UTF-32
encoded files?
One way uses
perl
instead ofgrep
:with the obvious change for the UTF-32 files.
This tells
perl
to use UTF-8 for output, that files opened for reading are encoded in UTF-16, and only print the lines that (case-insensitively) match the regular expression inside the//
's.Or use
iconv
to convert the file first:If you don't have an easy way to tell from the filename what encoding it is, maybe something like this script that uses
file
to try to guess the encoding and theniconv
to convert to UTF-8 to feed to GNU grep:Usage:
smartgrep hello *.txt