How to cast accented letters (wchar_t) to char?

949 views Asked by At

I ported an application from Windows to Linux and I encountered a problem with character encoding: I saw that accented letters (e.g. 'é' 'à') are considered as wchar_t (4 bytes with g++) whereas under Visual Studio, they take 1 byte (char). My unit tests failed because in my code I have character comparisons using accented letters (as in Linux they are multibyte).

Is it possible to cast accented letters (like 'û') to the Windows encoding (1 byte) in Linux or should I refactor my code and use std::wstring instead?

1

There are 1 answers

0
Christophe On BEST ANSWER

If 'é' can be stored on one character on Windows, your application was probably compiled without UNICODE and certainly with a Win 1252 encoding.

With the usual utf-8 encoding on linux, the 'é' should require 2 characters. This should cause a warning from the compiler. And if you would use the character obtained, it would represent only a part of the encoding, so that the char by char comparison would be meaningless.

If you want to keep your algorithms, using individual characters of a string, you'd better work with wchar_t and wstring (or event more portable: char32_t and u32string).

If you want to know more on character encoding and unicode with C++, I can only warmly recommend you the excellent video tutorial on unicode with C++ from James McNellis.