I am playing with SML/NJ (version 110.99.4) on Windows 10.
I have a structure containing a text file in UTF-8 encoding:
...
let
val s:string = "søk"
in
print s
end;
...
My console has 65001 code page (which is UTF-8) - chcp reports it.
This code prints søk. Then, I have 3 questions:
- As I know SML/NJ has
widestring(andwidechar) type for Unicode, but it's optional for Windows (actually it is missing), I supposed thatstringis ASCII string, but it seems it is not. So, what isstringtype? Codepoints? UTF-8? - How portable is this
stringfrom SML/NJ? Can I use it everywhere (on Linux, for example) where I want UTF-8? - Is this behavior of
stringsimilar for all SML implementations?
PS. Also my SML/NJ version has UTF8 structure (open UTF8 passes). It recalls wchar. But I see that string allows to print non-ASCII strings correctly. At the same time the structure String recalls char. It confuse me more even: what does string contains: wchar or char (but UTF8)? Then what is the missing widechar?
PPS. Attempt to enter non-ASCII string in sml.bat repl's session failed with:
stdIn:2.10 Error: illegal non-printing character in string
stdIn:2.11 Error: illegal non-printing character in string
stdIn:2.12 Error: illegal non-printing character in string
...
Sorry, for so many questions, I would appreciate any clarification about the state of Unicode, UTF-8 in the world of Standard ML (and SML/NJ) and convenient ways to work with them.
I found for instance, such library: https://github.com/cannam/sml-utf8 which defines
WdString. It allows to encode/decode to/from UTF8/wide-string and other "standard" (for SML) string operations. I tried it with SML/NJ and it seems it works.