Perl Encode - UK characters

417 views Asked by At

This is a part 2 question from This Question.

So I'm trying out the :encode functionality but having no luck at all.

use Encode;
use utf8;

# Should print: iso-8859-15
print "Latin-9 Encoding: ".find_encoding("latin9")->name."\n"; 

my $encUK = encode("iso-8859-15", "UK €");
print "Encoded UK: ".$encUK."\n";

Results:

Encoded UK: UK €

Shouldn't the results be encoded? what am I doing wrong here?

EDIT:

Added the suggested:

use utf8;

and now I get this:

Encoded UK: UK �

pulling hair out now :/

3

There are 3 answers

2
daxim On BEST ANSWER

Don't pull your hair. You did everything right, are finished and are already getting the intended data; the output is confusing you because you probably look at it from a terminal that is not set up for Latin-9, but for a different encoding, presumably UTF-8.

> perl -e'use utf8; use Encode; print encode "Latin-9", "Euro €"'
Euro �

> perl -e'use utf8; use Encode; print encode "Latin-9", "Euro €"' | hex
0000  45 75 72 6f 20 a4                                 Euro .

Codepoint A4 is indeed the Euro symbol in Latin-9.

6
Ether On

I think perhaps you are not encoding the character properly in your script. What does your editor think is its encoding?

e.g. I just tried this, to circumvent that entirely:

use Encode;

# Should print: iso-8859-15
print "Latin-9 Encoding: ".find_encoding("latin9")->name."\n";

my $encUK = encode("iso-8859-15", "\xA3");
print "Encoded UK: ", $encUK, "\n";

output:

 
Latin-9 Encoding: iso-8859-15  
Encoded UK: £  
0
dolmen On

"use utf8;" is, since Perl 5.8, only used to tell Perl that your source file is encoded in UTF-8.

So does the encoding of your source really matches what you're telling to Perl?

With 'vim' must use this option to write the file in UTF-8:

:set fenc=utf8

And to get back UTF-8 when you load the file, you must define fileencodings in your .vimrc:

set fileencodings=ucs-bom,utf-8,latin9