How to read WCP-1252 characters in C++?

554 views Asked by At

I did some Googling around, but couldn't find a clear answer (not using the correct terminology perhaps?)

Anyway, I have some text files in ANSI format (WCP-1252) whose characters I want to process in a C++ program, but the thing is I don't know how to store the 2-byte characters that correspond to decimal codes 128 through to 255. Just to be sure though, I tried the following code:

ifstream infile("textfile.txt");
char c;
infile>>c;                           //also tried infile.get(c);  
cout<<c;

Unsurprisingly, the 1-byte char failed to store any symbol from the extended set after 0x7F (I think it just displayed the ASCII symbol corresponding to the value of the first byte and discarded the second or vice verse).

1

There are 1 answers

5
Elvis Dukaj On

WCP-1252 is represented in 8-bit but some chars are not part of ASCII. I suggest you write a conversion table from WCP-1252 to wchar_t. Read char by char and convert to wchar_t. You can write a map< uint8_t, wchar_t >. For example:

wchar_t WCP1252Towc( char ch )
{
    static map< char, wchar_t > table
    {

        {0x30, L'0' },
        {0x31, L'1' },
        // ..
        {0x39, L'9'},

        {0x40, L'A'},
        // ...
        {0x5A, L'Z'},

        {0x61, L'a'},
        // ...
        {0x7A, L'z'},

        // ...
    };

    return table[ ch ]; 
};  

wstring WCP1252sTowcs( string str )
{
    const auto len = str.size();
    wstring res( len, L'\0' );

    for( size_t i = 0; i < len; ++i )
        res[ i ] = WCP1252Towc( str[ i ] );

    return res;
}

ifstream infile("textfile.txt");
string line; getline( infile, line );
auto unicode = WCP1252sTowcs( line );
wcout << unicode;