I need to process some Win-1251-encoded text (8-bit encoding, uses some of 128..255 for Cyrillic). As far as I can tell, C was created with 7-bit ASCII in mind, no explicit support for single-byte chars above 127. So I have several questions:
- Which is the more proper type for this text:
char[]
orunsigned char[]
? - If I use
unsigned char[]
with built-in functions (strlen
,strcmp
), the compiler warns about implicit casts tochar*
. Can such a cast break something? Should I re-implement some of the functions to supportunsigned char
strings explicitly?
C has three distinct character types,
signed char
,unsigned char
, andchar
, which may be either signed or unsigned. For strings, you should just usechar
, since that will play nice with all the built-in functions. They'll all also work fine on characters with numeric values greater than 127. You should have no problems with your case usingchar
.