Convert a UTF-8 character string to its integer value / UniChar (Objective C)

974 views Asked by At

I'm currently developing an Objective C program, that receives an input of UTF-8 characters as a string like "U+0008". These strings are of type NSString. Now I need to convert this string that represents this character to the corresponding UniChar (0x0008).

Do you know an elegant way of doing this?

Thanks! Pedro

1

There are 1 answers

0
CRD On

If you have an NSString of the form @"U+xxxx" where the x are hexadecimal digits then below are two ways to obtain the value, the "elegance" of each is in the eye of the beholder:

a) Use NSScanner. The method scanString:intoString can be used to check for the U+, the method scanHexInt: to read in the hex value, and the method isAtEnd to check that there is nothing left after the hex value. This method does not limit the hex number to a maximum of four digits.

b) Use sscanf or strtol. These are C-level APIs, you can obtain a C-string from your NSString using UTF8String. With scanf you can check for the U+ and read in a hex number with a set maximum number of digits in one line - if that is your definition of "elegant". However checking all input is consumed requires a little thought.

There are numerous other ways to do this, anything from roll-your-own to using regular expressions (NSRegularExpression) to check the format and extract the 4 hex digits in one go ready for conversion to an integer.

Note if you are seeing 4 hex digits then you have a 16-bit value, more like UTF-16 than UTF-8.