Before posting my question to the ActiveState forum, I'd like to try luck here :)
I'm trying to convert a simple script of mine to .exe file using Perlapp (version 8.1). The Perl script works fine and it seems Perlapp also did its job successfully. But the converted .exe file has some weird behavior, which, I believe, must be related to utf-8 encoding. For example, the Perl script would yield the result like:
hàn huáng zhòng sè sī qīng guó
But running the executable file would give me only this:
h hu zh s s q gu
I've already configured Perlapp so that utf8.pm is explicitly added but the problem just refuses to go away. I've tried something else. For example,
binmode DATA, ":utf8";
and
">:encoding(utf8)"
but without any luck;
Can anyone kindly give me some hint as to what might be the reason? Thanks like always :)
I can post the whole code here but it seems unnecessary so I just paste some snippets of the code that I think is relevant to the problem:
use utf8;
%zidian = map {chomp;split/\s+/,$_,2} <DATA>;
open my $in,'<:utf8',"./original.txt";
open my $out,'>:utf8',"./modified.txt";
if ( $code~~%zidian) {
$value = lc$zidian{$code};
}
__DATA__
3400 Qiū
3401 TIǎN
3404 KUà
3405 Wǔ
And one more thing, I'm running ActivePerl 5.10.0.on Windows XP (Chinese Version) and the script is saved as utf-8 encoding without BOM. PerlApp cannot handle a script that has BOM.
Edit
If I were to give a workable snippet, then I suppose it's like giving the whole code because I'm using three inter-connected sub-routines, which I take with some modifications from Lingua::han::Pinyin module and Lingua::han::Utils module.
#! perl
# to make good vertical alignment,
# set font family to SonTi and font size to Four(12pts)
use utf8;
sub Unihan {
my $hanzi = shift;
my @unihan = map { uc sprintf("%x",$_) } unpack ("U*", $hanzi);
}
sub csplit {
my $hanzi = shift;
my @return_hanzi;
my @code = Unihan($hanzi);
foreach my $code (@code) {
my $value = pack("U*", hex $code);
push @return_hanzi, $value if ($value);
}
return wantarray ? @return_hanzi : join( '', @return_hanzi );
}
%zidian = map {chomp;split/\s+/,$_,2} <DATA>;
sub han2pinyin {
my $hanzi = shift;
my @pinyin;
my @code = Unihan($hanzi);
foreach $code (@code) {
if ( $code~~%zidian) {
$value = lc$zidian{$code};
}
else {
$value = " ";
}
push @pinyin, $value;
}
return wantarray ? @pinyin : join( '', @pinyin );
}
open $in,'<:utf8',"./original.txt";
seek $in, 3,0;
open $out,'>:utf8',"./modified.txt";
while(<$in>){
s/(.{18})/$1\n/g;
push @tmp, $_;
}
foreach (@tmp){
my @hanzi;
my @pinyin;
@hanzi = csplit($_);
my $hang = join "", @hanzi;
@pinyin = han2pinyin($hang);
for ( my $i = 0; $i < @hanzi && $i < @pinyin; ++$i ) {
if ( $hanzi[$i] =~ /[\xEFBC8C]|[\xE38082]|[\xEFBC81]|[\xEFBC9F]|[\xE2809C]|[\xE2809D]|[\xEFBC9A]/ ) {
splice(@pinyin, $i, 0," ");
}
}
printf $out "%-7s" x @pinyin, @pinyin;
print $out "\n";
printf $out "%-6s" x @hanzi, @hanzi;
print $out "\n";
}
__DATA__
3400 Qiū
3401 TIǎN
3404 KUà
3405 Wǔ
ActiveState hasn't given me any help yet. Whatever. Now I've figured out a workaround for my problem and this workaround looks very weird.
First I added some otherwise useless utf-8 encoded characters to my DATA section, like the following:
And then I removed the use utf8; pragma from my script; and then I removed the utf8 flag from the following line of code:
Now it becomes
But I had to let the following line of code unchanged:
Then everything was okay except that I'd receive "wide characters in print" warnings. But I added another line of code:
And then I Perlapped my script and everything worked fine :)
This is really strange. I'm suspecting this problem is somehow OS specific. It's also quite likely that there's something wrong with my Windows system. And I also tried Perl2exe and the compiled executable gave me some "memory 0010c4 cannot be read" error. Whatever. My problem is somehow fixed by myself :)