I'm working with UTF-16LE encoded CSV files. I use the Perl module Text::CSV_XS to handle the data:
my $csv = Text::CSV_XS->new ({ binary => 1, sep_char => ';', quote_char => undef, });
open my $io, '<:encoding(UTF-16LE)', $csv_file or die "$csv_file: $!";
my $header_row = $csv->getline($io);
Printing the first row using Data::Dumper, the BOM is shown in the output:
print Dumper $header_row->[0];
# output:
# $VAR1 = "\x{feff}first header col";
According to perldoc, the BOM is preserved because I explicitly state the content to be UTF-16LE. When writing :encoding(UTF-16)
only, the BOM is removed.
But I would like to keep it in the code to explicitly state the required encoding. I guess that this is a good thing. If not, please tell me.
But then, I have to handle the BOM, e.g. by writing: $header_row->[0] =~ s/^\x{FEFF}//;
Is this normal? Do I have to care about BOMs in my strings when working with utf-16 encoded files? Or am I making something wrong?