"Pathname can't be converted from UTF-8 to current locale" warning with Libarchive::Read module

1.1k views Asked by At

I'm getting the file listings for tar.gz files using the Libarchive::Read module. When a tarball file name has UTF-8 characters in it, I get an error which is generated by the libarchive C library:

Pathname can't be converted from UTF-8 to current locale.

in block at /Users/steve/.rakubrew/versions/moar-2022.12/share/perl6/site/sources/42AF7739DF41B2DA0C4BF2069157E2EF165CE93E (Libarchive::Read) line 228

The error is thrown with the Raku code here:

my $r := Libarchive::Read.new($newest_file);
my $needs_update = False;
for $r -> $entry {  # WARNING THROWN HERE for each file in tarball listing
    $entry.pathname;
    $needs_update = True if $entry.is-file && $entry.pathname && $entry.pathname ~~ / ( \.t || \.pm || \.pm6 ) $ / ;
    last if $needs_update;
}

I'm on a mac. The locale command reports the following:

LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL="en_US.UTF-8"

There seems to be a well-reported bug with the libarchive C library: https://github.com/libarchive/libarchive/issues/587.

Is there anyway to tell Raku to tell the module what locale is getting used so I can get the listing of tarballs with utf-8 characters?

1

There are 1 answers

0
StevieD On

To workaround this problem, I moved to a more recent Raku module, Archive::Libarchive. This code works without complaining:

my Archive::Libarchive $a .= new: operation => LibarchiveRead, file => $newest_file.Str;
my Archive::Libarchive::Entry $entry .= new;

my $needs_update = False;
while $a.next-header($entry) {
     $a.data-skip;
     $needs_update = True if $entry.pathname.substr(*-1) ne '/' && $entry.pathname && $entry.pathname ~~ / ( \.t || \.pm || \.pm6 ) $ / ;
     last if $needs_update;
            }
$a.close;

This code also uses the libarchive C library but I guess in a way that knows how to work with utf-8 characters.