Why are some characters missing when I converted my Perl script to executable using Perlapp?

456 views Asked by At

Before posting my question to the ActiveState forum, I'd like to try luck here :)

I'm trying to convert a simple script of mine to .exe file using Perlapp (version 8.1). The Perl script works fine and it seems Perlapp also did its job successfully. But the converted .exe file has some weird behavior, which, I believe, must be related to utf-8 encoding. For example, the Perl script would yield the result like:

hàn    huáng  zhòng  sè     sī     qīng   guó 

But running the executable file would give me only this:

h      hu     zh     s      s      q      gu

I've already configured Perlapp so that utf8.pm is explicitly added but the problem just refuses to go away. I've tried something else. For example,

binmode DATA, ":utf8"; 

and

">:encoding(utf8)"

but without any luck;

Can anyone kindly give me some hint as to what might be the reason? Thanks like always :)

I can post the whole code here but it seems unnecessary so I just paste some snippets of the code that I think is relevant to the problem:

use utf8;

%zidian = map {chomp;split/\s+/,$_,2} <DATA>;

open my $in,'<:utf8',"./original.txt";
open my $out,'>:utf8',"./modified.txt";

if ( $code~~%zidian) {
           $value = lc$zidian{$code};
}

__DATA__
3400    Qiū
3401    TIǎN
3404    KUà
3405    Wǔ

And one more thing, I'm running ActivePerl 5.10.0.on Windows XP (Chinese Version) and the script is saved as utf-8 encoding without BOM. PerlApp cannot handle a script that has BOM.

Edit

If I were to give a workable snippet, then I suppose it's like giving the whole code because I'm using three inter-connected sub-routines, which I take with some modifications from Lingua::han::Pinyin module and Lingua::han::Utils module.

#! perl
# to make good vertical alignment,
# set font family to SonTi and font size to Four(12pts)
use utf8;


sub Unihan {
    my $hanzi = shift;
    my @unihan = map { uc sprintf("%x",$_) } unpack ("U*", $hanzi);
    }

sub csplit {
    my $hanzi = shift;
    my @return_hanzi;
    my @code = Unihan($hanzi);
    foreach my $code (@code) {
        my $value = pack("U*", hex $code);
        push @return_hanzi, $value if ($value);
    }
    return wantarray ? @return_hanzi : join( '', @return_hanzi );
    }

%zidian = map {chomp;split/\s+/,$_,2} <DATA>;

sub han2pinyin {
    my $hanzi = shift;
    my @pinyin;
    my @code = Unihan($hanzi);
     foreach $code (@code) {
           if ( $code~~%zidian) {
           $value = lc$zidian{$code};
        }
        else {
            $value = " ";
        }
        push @pinyin, $value;
    }
    return wantarray ? @pinyin : join( '', @pinyin );
}

open $in,'<:utf8',"./original.txt";
seek $in, 3,0;
open $out,'>:utf8',"./modified.txt";

while(<$in>){
     s/(.{18})/$1\n/g;
     push @tmp, $_;
}

foreach (@tmp){
my @hanzi;
my @pinyin;
@hanzi = csplit($_);
my $hang = join "", @hanzi;
@pinyin = han2pinyin($hang);

for ( my $i = 0; $i < @hanzi && $i < @pinyin; ++$i ) {
           if ( $hanzi[$i] =~ /[\xEFBC8C]|[\xE38082]|[\xEFBC81]|[\xEFBC9F]|[\xE2809C]|[\xE2809D]|[\xEFBC9A]/ ) {
            splice(@pinyin, $i, 0," ");
        }
       }

printf $out "%-7s" x @pinyin, @pinyin;
print $out "\n";
printf $out "%-6s" x @hanzi, @hanzi;
print $out "\n";
}


__DATA__
    3400    Qiū
    3401    TIǎN
    3404    KUà
    3405    Wǔ
1

There are 1 answers

0
Mike On BEST ANSWER

ActiveState hasn't given me any help yet. Whatever. Now I've figured out a workaround for my problem and this workaround looks very weird.

First I added some otherwise useless utf-8 encoded characters to my DATA section, like the following:

__DATA__
aardvark 'ɑ:dvɑ:k
aardwolf 'ɑ:dwulf
aasvogel 'ɑ:sfәugәl
3400    Qiū
3401    TIǎN
3404    KUà
3405    Wǔ

And then I removed the use utf8; pragma from my script; and then I removed the utf8 flag from the following line of code:

open $out,'>:utf8',"./modified.txt";

Now it becomes

open $out,'>',"./modified.txt";

But I had to let the following line of code unchanged:

open $in,'<:utf8',"./original.txt";

Then everything was okay except that I'd receive "wide characters in print" warnings. But I added another line of code:

no warnings;

And then I Perlapped my script and everything worked fine :)

This is really strange. I'm suspecting this problem is somehow OS specific. It's also quite likely that there's something wrong with my Windows system. And I also tried Perl2exe and the compiled executable gave me some "memory 0010c4 cannot be read" error. Whatever. My problem is somehow fixed by myself :)