how do you remove surrogate values from a std::string in c++? looking for regular expression like this:
string pattern = u8"[\uD800-\uDFFF]";
regex regx(pattern);
name = regex_replace(name, regx, "_");
how do you do it in a c++ multiplatform project (e.g. cmake).
First off, you can't store UTF-16 surrogates in a
std::string
(char
-based), you would needstd::u16string
(char16_t
-based), orstd::wstring
(wchar_t
-based) on Windows only. Javascript strings are UTF-16 strings.For those string types, you can use either:
std::remove_if()
+std::basic_string::erase()
:std::erase_if()
(C++20 and later only):UPDATE: You edited your question to change its semantics. Originally, you asked how to remove surrogates, now you are asking how to replace them instead. You can use
std::replace_if()
for that task, eg:Or, if you really want a regex-based approach, you can use
std::regex_replace()
, eg: