SUMMARY
How can I write a zip file using libarchive in C++, such that path names will be UTF-8 encoded? With UTF-8 path names, special characters will be decoded correctly when using OS X / Linux / Windows 8 / 7-Zip / WinZip.
DETAILS
I am trying to write a zip archive using libarchive, compiling with Visual C++ 2013 on Windows.
I would like to be able to add files with non-ASCII chars (e.g. äöü.txt) to the zip archive.
There are four functions to set the pathname header in libarchive:
void archive_entry_set_pathname(struct archive_entry *, const char *);
void archive_entry_copy_pathname(struct archive_entry *, const char *);
void archive_entry_copy_pathname_w(struct archive_entry *, const wchar_t *);
int archive_entry_update_pathname_utf8(struct archive_entry *, const char *);
Unfortunately, none of them seem to work.
In particular, I have tried:
const char* myUtf8Str = ...
archive_entry_update_pathname_utf8(entry, myUtf8Str);
// this sounded like the most straightforward solution
and
const wchar_t* myUtf16Str = ...
archive_entry_copy_pathname_w(entry, myUtf16Str);
// UTF-16 encoded strings seem to be the default on Windows
In both cases, the resulting zip archive does not show the file names correctly in both Windows Explorer and 7-Zip.
I am certain that my input strings are encoded correctly, since I convert them from Qt QString
instances that work perfectly well in other parts of my code:
const char* myUtf8Str = filename.toUtf8().constData();
const wchar_t* myUtf16Str = filename.toStdWString().c_str();
For instance, this works even for another call to libarchive, when creating the zip file:
archive_write_open_filename_w(archive, zipFile.toStdWString().c_str());
// creates a zip archive file where the non-ASCII
// chars are encoded correctly, e.g. äöü.zip
I have also tried to change the options for libarchive, as suggested by this example:
archive_write_set_options(a, "hdrcharset=UTF-8");
But this call fails, so I assume that I have to set some other option, but I'm running out of ideas...
UPDATE 2
I have done some more reading about the zip format. It allows writing file names in UTF-8, such that OS X / Linux / Windows 8 / 7-Zip / WinZip will always decode them correctly, see e.g. here.
This is what I want to achieve using libarchive, i.e. I would like to pass it my UTF-8 encoded pathname
and have it store that in the zip file without doing any conversion.
I have added the "set locale" approach as an (unsatisfying) answer.
This is a workaround that will store path names using the system's locale settings, i.e. the resulting zip file can be decoded correctly on the same system, but is not portable.
This is not satisfying, I am just posting this to show that it is not what I am looking for.
Set the global locale to
""
as explained here:and then read it back:
Then set
pathname
by usingarchive_entry_update_pathname_utf8
.The zip file now contains file names encoded with Windows-1252, so my Windows can read them, but they appear as garbage on e.g. Linux.
Future
There is a libarchive issue for UTF-8 filenames. The whole story is quite complicated, but it sounds like they may add better UTF-8 support in libarchive 4.0.