Why is ᏌᏊ ᎢᏳᎾᎵᏍᏔᏅ ᏍᎦᏚᎩ the native name of the U.S.?

1.3k views Asked by At

When I use this code:

var ri = new RegionInfo("us");
var nativeName = ri.NativeName;   // ᏌᏊ ᎢᏳᎾᎵᏍᏔᏅ ᏍᎦᏚᎩ

why is nativeName then the string "ᏌᏊ ᎢᏳᎾᎵᏍᏔᏅ ᏍᎦᏚᎩ" (in Cherokee)?

If I change to new RegionInfo("US") (only difference, capital US), I get instead "United States".

I do know the preferred usage of RegionInfo is to give a specific culture info string such as:

new RegionInfo("en-US")
new RegionInfo("chr-Cher-US")

and so on, and that works. But why is Cherokee preferred over English only if I use lower-case us?


(Seen on Windows 10 (version 1803 "April 2018 Update"), .NET Framework 4.7.2.)


Update: This is not consistent, even on the same machine. For example I tried opening PowerShell very many times, each time pasting [System.Globalization.RegionInfo]'US' into it. It seems like for a long period, all instances of PowerShell are consistently giving the same result. But then after a while, the instances of PowerShell then give the opposite result. Here is a screenshot of two of the windows, one consistently having one NativeName, and the other one consistently having the opposite one. So there must be some non-deterministic determination going on (no difference in casing):

PowerShell windows

1

There are 1 answers

9
Gabriel Luci On

The first thing to note is that the constructor for RegionInfo finds the region by finding a culture used in that region. So it's looking for a language in that country, not just the country.

Reading through that source code, it seems like the difference in upper/lower case is because of how the lookups are done if no culture is specified with the region.

For example, it tries a couple things first, but then it will try to look in a static list of regions. But because it's using Dictionary.ContainsKey, it's a case-sensitive search. So if you specify "US", it will find it, but not "us".

Later, it searches through all the cultures (from CultureInfo.GetCultures(CultureTypes.SpecificCultures)) for the region you gave, but it does so in a case-insensitive way.

I can't confirm since I can't step through that code, but my guess is that, because it's going through the list in order, it will get to chr-Cher-US before it gets to en-US.

Why is it not consistent?

One of the comments said that LinqPad finds Cherokee even when using upper case. I don't know why this is. I was able to replicate that, but I also found that in Visual Studio, it's English when using "US" and Cherokee when using "us", like you describe. But I did find that if I turn on "Use experimental Roslyn assemblies" in LinqPad, then it returns English for both "US" and "us". So maybe it has something to do with the exact runtime version targetted, I can't say for sure.

One thing that affects consistency is caching: the first thing that it will do when it does not get a complete match by culture + region is check a cache of already-found cultures. It lower-cases all the keys in that cache, so this cache is case-insensitive.

You can test this. We know that using "US" vs. "us" will yield different results, but try this in the same program:

var nativeNameus = new RegionInfo("us").NativeName;
var nativeNameUS = new RegionInfo("US").NativeName;

Then swap them and run it again:

var nativeNameUS = new RegionInfo("US").NativeName;
var nativeNameus = new RegionInfo("us").NativeName;

Both results will always be equal because the first culture is cached and used for the next.

It's possible that there is code outside of your code that calls the same methods and ends up caching a culture value, thereby changing the result you get when you do the same.

Conclusion

All that said, the docs actually say:

We recommend that you use the culture name—for example, "en-US" for English (United States)—to access the NativeName property.

So it is a bit of a moot point: you asked for a region, not a language. If you need a specific language, ask for that language, not just a region.

If you want to guarantee English, then either:

  1. Do as Microsoft recommends and specify the language with the region: "en-US", or
  2. Use the EnglishName or DisplayName properties (which are English even when the NativeName is Cherokee).