Using modern C++ and the std library, what is the easiest and cleanest way to convert a std::string containing windows-1252 encoded characters to utf-8?

My use case is I'm parsing a CSV files which is windows-1252 encoded, and then push some of its data to node-js using Node-Api (node-addon-api), which requires UTF-8 encoded strings.

1

There are 1 answers

0
Remy Lebeau On

Using just the standard library, the closest solution would probably be to use std::wstring_convert with a custom Windows-1252 facet to convert the std::string to a std::wstring, and then use std::wstring_convert with a standard UTF-8 facet to convert the std::wstring to a std::string.

However, std::wstring_convert is deprecated since C++17, with no replacement in sight. So you are better off using a 3rd-party Unicode library to handle the conversion, such as iconv, ICU, etc. Or platform-specific APIs, like MultiByteToWideChar() and WideCharToMultiByte() on Windows, etc.

Or, you could simply implement the conversion yourself, since Windows-1252 is a very simple encoding, it has only 251 characters defined. A trivial lookup table to convert each 8bit character to its UTF-8 equivalent would suffice.