Convert string to wstring [russian symbols without locale::global]

2.1k views Asked by At

Is it possible to convert string to wstring (assume, that string contains only russian symbols and system encoding is utf-8) without using std::locale::global(std::locale(""));? I need solution for C++98.

Some code:

string s = "Николай";
wstring ws;
StrToWstr(ws, s);
printf("str: %ls\n", ws.c_str());

Output is empty. But, when I add

std::locale::global(std::locale(""))

it will print me

 "Николай" (correct output).

StrToWstr method:

size_t StrToWstr(wstring& aDst, const string& aSrc)
{
    size_t length;
    length = mbstowcs(NULL, aSrc.c_str(), 0);
    if (length != static_cast<size_t>(-1)) {
        wchar_t *buffer = new wchar_t[length + 1];
        length = mbstowcs(buffer, aSrc.c_str(), length);
        buffer[length] = L'\0';
        aDst.assign(buffer);
        delete[] buffer;
    }
    return length;
}

Debugging shows, that ws contains following:

    Name : ws
        Details:{static npos = <optimized out>,
 _M_dataplus = {<std::allocator<wchar_t>> =
 {<__gnu_cxx::new_allocator<wchar_t>> = {<No data fields>},
<No data fields>}, _M_p = 0xb7fbda7c L""}}
1

There are 1 answers

1
Marco Veglio On

How would you like to encode your output string? UTF16, UTF2 or anything else? In case either conversion is fine you can try

// UTF16 conversion
std::wstring_convert<codecvt_utf8_utf16<wchar_t>> converter;
aDst = converter.from_bytes(aSrc);

// UTF2 conversion
std::wstring_convert<codecvt_utf8<wchar_t>> converter;
aDst = converter.from_bytes(aSrc);

I'm not sure a UTF8->Multibyte conversion is feasible, but you can try

std::wstring_convert<std::codecvt<wchar_t, char, std::mbstate_t>> converter;
aDst = converter.from_bytes(aSrc);

You may want to have a look at http://en.cppreference.com/w/cpp/locale/codecvt for more information.