Convert a string with accents from console in a UTF8 wstring

1.2k views Asked by At

When I enter 'café' in Windows console, in the wide string I got 'caf' 'c' code : 99 'a' code : 97 'f' code : 102 '' code : 130 or other strange values with the stuff I found in the internet,... 233 is the correct value which is the UTF-8 code for 'é'

#undef      UNICODE
#define     UNICODE
wstring wstrCharsList;
std::getline(wcin, wstrCharsList);
if (!std::wcin.good()) cout << "problem !\n";
wcout << wstrCharsList << std::endl;

I tried ALL the stuff I found on the other SO questions and on the web (especially : https://alfps.wordpress.com/2011/12/08/unicode-part-2-utf-8-stream-mode/) and nothing worked.

I need a wstring encoded with UTF8 to provide it to my API to perform some string comparisons (with strings loaded from a text UTF-8 encoded file.)

NB: On Linux my program works correctly. FU Microsoft.

1

There are 1 answers

0
Aminos On BEST ANSWER

By tweaking, I found the solution above:

const wchar_t * ConvertToUTF16(const char * pStr)
{
   static wchar_t wszBuf[1024];
   MultiByteToWideChar(CP_OEMCP, 0, pStr, -1, wszBuf, sizeof(wszBuf));
   return wszBuf;
}
...
string strExtAsciiInput;
getline(cin, strExtAsciiInput);
wstring wstrTest = ConvertToUTF16(strExtAsciiInput.c_str());

And miraculously 'café' is correctly converted to UTF-8 wstring: 'é' has 233 code ! can anyone expalin to me why this work ? in MultiByteToWideChar when I use the flag CP_UTF8 the output is incorrect 'é' is wrong (2 bytes) but with CP_OEMCP it is correctly parsed and 'é' has the correct UTF-8 code... Seriously WTF ?