Do I have to remove the BOM by myself?

384 views Asked by At

I'm working with UTF-16LE encoded CSV files. I use the Perl module Text::CSV_XS to handle the data:

my $csv = Text::CSV_XS->new ({ binary => 1, sep_char => ';', quote_char => undef, });
open my $io, '<:encoding(UTF-16LE)', $csv_file or die "$csv_file: $!";
my $header_row = $csv->getline($io);

Printing the first row using Data::Dumper, the BOM is shown in the output:

print Dumper $header_row->[0];
# output:
# $VAR1 = "\x{feff}first header col";

According to perldoc, the BOM is preserved because I explicitly state the content to be UTF-16LE. When writing :encoding(UTF-16) only, the BOM is removed.

But I would like to keep it in the code to explicitly state the required encoding. I guess that this is a good thing. If not, please tell me.

But then, I have to handle the BOM, e.g. by writing: $header_row->[0] =~ s/^\x{FEFF}//;

Is this normal? Do I have to care about BOMs in my strings when working with utf-16 encoded files? Or am I making something wrong?

0

There are 0 answers