Unicode code points range from U+000000
to U+10FFFF
. While writing myself a lexer generator in F#, I ran into the following problem:
For the character set definitions, I intend to use a simple tuple of type char * char
, expressing a range of characters. Omitting some peripheral details, I also need a range I call All
and which is supposed to be the full unicode range.
Now, it is possible to define a char literal as such: let c = '\u3000'
. And for strings, it is also possible to refer to a real 32 bit code point like this: let s = "\U0010FFFF"
. But the latter does not work for chars. The reason being, that a char in .NET is a 16 bit unicode character and the code point would yield 2 words, not one.
So the question is - is there a way I can stick to my char * char
tuple and get my All
defined somehow or do I need to change it to uint32 * uint32
and define all my character ranges as 32 bit values? And if I have to change, is there a type I should prefer over uint32
I did not discover yet?
Thanks, in advance.