What is under the hood of std::tolower?

147 views Asked by At

I wast just reading about std::tolower in CPP-Reference.

Is std::to_lower maybe just a wrapper of a std::use_facet function?

Please see the following example?

#include <iostream>
#include <locale>

int main() {
    char c1{ 'A' }, c2{'B'};

    std::cout << std::use_facet<std::ctype<char>>(std::locale("C")).tolower(c1) << '\n';
    std::cout << (char)std::tolower(c2) << '\n';
}

Yes, std::tolower works with integers, but else, is it calling use_facet or similar?

1

There are 1 answers

7
Dúthomhas On

What is under the hood of std::tolower?

Absolutely nothing useful.

Supposedly the library can use a locale to handle language concerns, but as it currently stands in C++ this has been a long, frustrating pipe dream.

What do I do, then?

Use IBM’s International Components for Unicode. It is a mature, stable library that exists on literally every system that it makes sense to program with i18n. It is on Android and iOS phones (and all the knock-offs), it is on Windows, Linux, Unix, OS X, etc.

The tricky part is just interfacing with the installed system ICU. That is different for each system, but not particularly difficult. (It becomes part of the build script, as does every system-dependent build script.)

ICU works with both C and C++ (though the C++ capability is quite a bit lean compared to the C capability).

(You can also use it with Java, and ported interfaces exist for quite a few other languages as well.)

Since you have C++ tagged, I recommend you just use the C capabilities of the library over a std::wstring (Windows, C++17 or earlier) or a std::u16string (Windows C++20+ and everything else).

Boost Libraries

Boost provides a very nice C++ library to do this kind of stuff.

You can configure Boost Locale to use ICU as a backend.

I haven’t messed with it for quite a long time, and configuring the compile (Boost Locale is one of the Boost Libraries that needs to be compiled) is tricky. Make your way through that and you are golden, though.

Caveats

Managing your locale becomes important. Your program should default to using the user’s system-indicated locale. ICU makes this easy to access and use.

Letter casing is not a universal capability in all languages. Case-conversion and case-folding functions understand this, and behave correctly for those languages.

One particular point is that Turkish has a corner case you should be aware of: the letter I. Any reading you do on letter casing should mention this.

Remember also, that locale is context sensitive. For example, you will likely wish to use a different locale for program code vs strings displayed to the user.