I have tar file which contains multibyte characters (japanese) . I am using libarchive to un tar the file . The filenames inside the tar files are encoded using utf-8 . When I try to untar the file the result always looses the multibyte characters .
I wrote a python script to achieve my result which worked
#!/usr/bin/python27
import tarfile
import pdb
def transform(data):
u = data.decode('utf8')
pdb.set_trace()
#return u.encode('utf8')
return u
tar = tarfile.open('abc.tar')
for m in tar.getmembers():
print m.name
m.name = transform(m.name)
#print m.name
tar.extractall()
However I want to achieve the same in c++. This is an extract of the cpp code
while (entry = tar_file->nextEntry()) {
fs::path filepath = path / entry->getFileName(); // loose the utf-8 character s here
// So I tried the following
int wchars_num = MultiByteToWideChar( CP_ACP , 0 , filepath.string().c_str() , -1, NULL , 0 );
wchar_t* wstr = new wchar_t[wchars_num];
//I tried UTF-8 as well in place of CP_ACP
MultiByteToWideChar( CP_ACP , 0 , filepath.string().c_str() , -1, wstr , wchars_num );
// But this did not help