Proper handling of 128..255 chars in C

162 views Asked by At

I need to process some Win-1251-encoded text (8-bit encoding, uses some of 128..255 for Cyrillic). As far as I can tell, C was created with 7-bit ASCII in mind, no explicit support for single-byte chars above 127. So I have several questions:

  • Which is the more proper type for this text: char[] or unsigned char[]?
  • If I use unsigned char[] with built-in functions (strlen, strcmp), the compiler warns about implicit casts to char*. Can such a cast break something? Should I re-implement some of the functions to support unsigned char strings explicitly?
1

There are 1 answers

5
Carl Norum On

C has three distinct character types, signed char, unsigned char, and char, which may be either signed or unsigned. For strings, you should just use char, since that will play nice with all the built-in functions. They'll all also work fine on characters with numeric values greater than 127. You should have no problems with your case using char.