Using modern C++ and the std library, what is the easiest and cleanest way to convert a std::string containing windows-1252 encoded characters to utf-8?
My use case is I'm parsing a CSV files which is windows-1252 encoded, and then push some of its data to node-js using Node-Api (node-addon-api), which requires UTF-8 encoded strings.
Using just the standard library, the closest solution would probably be to use
std::wstring_convertwith a custom Windows-1252 facet to convert thestd::stringto astd::wstring, and then usestd::wstring_convertwith a standard UTF-8 facet to convert thestd::wstringto astd::string.However,
std::wstring_convertis deprecated since C++17, with no replacement in sight. So you are better off using a 3rd-party Unicode library to handle the conversion, such as iconv, ICU, etc. Or platform-specific APIs, likeMultiByteToWideChar()andWideCharToMultiByte()on Windows, etc.Or, you could simply implement the conversion yourself, since Windows-1252 is a very simple encoding, it has only 251 characters defined. A trivial lookup table to convert each 8bit character to its UTF-8 equivalent would suffice.