I'm writing a PHP application that supports multiple languages.
When setting the locale in PHP, I am required to provide a value defined in, what I believe to be, RFC 1766 / ISO 639, according to the setlocale documentation.
setlocale( LC_ALL, 'en_US' );
var_dump( setlocale( LC_MESSAGES, '0' ) );
// string(5) "en_US"
When using this locale to describe the HTML lang attribute, validation fails because it is not formatted to RFC 5646. The RFC 5646 value for this language is actually en-US
(note the use of a hyphen instead of an underscore).
Using this value in PHP's setlocale function, as above, results in the following output:
string(1) "C"
I have no idea why it is returning a value of C, but I presume it is because the locale provided was incorrectly formatted. C being the original server default, which is described as ASCII (thanks to @Cheery for the reference).
So, I'm wondering what I should do about that. I could, feasibly, use PHP's str_replace function to switch -
to _
before outputting the lang attribute, like so:
<?php setlocale( 'en_US' ); ?>
<!doctype html>
<html lang="<?= str_replace( '_', '-', setlocale(LC_MESSAGES, '0') ); ?>">
...
But, I'm concerned that there may be other differences between the two language specifications that could yield an unexpected problem down the road. If so, is there a preferred way to translate the language codes already in PHP, or a translation class that can be used?
Bonus question, why does my server default to value of C for the locale?
You need to have in mind that setLocal accept many types of "locale" names like names and mixed things, for example in (from php documentation):
You have 'de_DE@euro' which isn't a valid HTML lang code.
So first, you need to ensure that is in the format
lang_region
before trying to convert it.