convert case of wide characters, given the LCID (Visual C++)

624 views Asked by At

I have some existing Visual C++ code where I need to add the conversion of wide character strings to upper or lower case.

I know there are pitfalls to this (such as the Turkish "I"), but most of these can be ironed-out if you know the language. Fortunately in this area of code I know the LCID value (locale ID) which I guess is the same as knowing the language.

As LCID is a Windows type, is there a Windows function that will convert wide strings to upper or lower case?

The C runtime function _towupper_l() sounds like it would be ideal but it takes a _locale_t parameter instead of LCID, so I guess it's unsuitable unless there is a completely reliable way of converting an LCID to a _locale_t.

1

There are 1 answers

0
Cody Gray - on strike On

The function you're searching for is called LCMapString and it is part of the Windows NLS APIs. The LCMAP_UPPERCASE flag maps characters to uppercase, while the LCMAP_LOWERCASE maps characters to lowercase.

For applications targeting Windows Vista and later, there is an Ex variant that works on locale names instead of identifiers, which are what Microsoft now says you should prefer to use.

In fact, in the CRT implementation provided with VS 2010 (and presumably other versions as well), functions such as _towupper_l ultimately end up calling LCMapString after they extract the locale ID (LCID) from the specified _locale_t.

If you're like me, and less familiar with the i8n APIs than you should be, you probably already know about the CharUpper, CharLower, CharUpperBuff, and CharLowerBuff family of functions. These have been the old standbys from the early days of Windows for altering the case of chars/strings, but as their documentation warns:

Note that CharXxx always maps uppercase I to lowercase I ("i"), even when the current language is Turkish or Azeri. If you need a function that is linguistically sensitive in this respect, call LCMapString.

What it neglects to mention is filled in by a couple of posts on Michael Kaplan's wonderful blog on internationalization issues: What does "linguistic casing" mean?, How best to alter case. The executive summary is that you achieve the same results as the CharXxx family of functions by calling LCMapString and not specifying the LCMAP_LINGUISTIC_CASING flag, whereas you can be linguistically sensitive by ensuring that you do specify the LCMAP_LINGUISTIC_CASING flag.

Sample code:

std::wstring test("Does my code pass the Turkey test?");
if (!LCMapStringW(lcid,            /* your LCID, defined elsewhere */
                  LCMAP_UPPERCASE | LCMAP_LINGUISTIC_CASING,
                  test.c_str(),    /* input string */
                  test.length(),   /* length of input string */
                  &test[0],        /* output buffer (can reuse input) */
                  test.length()))  /* length of output buffer (same as input) */
{
   // Uh-oh! Something went wrong in the call to LCMapString, so you need to
   // handle the error somehow here.
   // A good start is calling GetLastError to determine the error code.
}