Using modern C++ and the std library, what is the easiest and cleanest way to convert a std::string
containing windows-1252
encoded characters to utf-8
?
My use case is I'm parsing a CSV files which is windows-1252
encoded, and then push some of its data to node-js using Node-Api (node-addon-api), which requires UTF-8
encoded strings.
Using just the standard library, the closest solution would probably be to use
std::wstring_convert
with a custom Windows-1252 facet to convert thestd::string
to astd::wstring
, and then usestd::wstring_convert
with a standard UTF-8 facet to convert thestd::wstring
to astd::string
.However,
std::wstring_convert
is deprecated since C++17, with no replacement in sight. So you are better off using a 3rd-party Unicode library to handle the conversion, such as iconv, ICU, etc. Or platform-specific APIs, likeMultiByteToWideChar()
andWideCharToMultiByte()
on Windows, etc.Or, you could simply implement the conversion yourself, since Windows-1252 is a very simple encoding, it has only 251 characters defined. A trivial lookup table to convert each 8bit character to its UTF-8 equivalent would suffice.