EBCDIC to ASCII not working Properly

1.4k views Asked by At

I have to process a file which comes from mainframes. There are some Non-Latin text in the file. I have to process this Non-Latin characters for some invalid characters. As the mainframe encodes the data in EBCDIC format, I have to convert it to ASCII to do the validation.

I used this code to convert from EBCDIC to ASCII. But when I execute the program for the sample input, I am getting Hello there] instead of Hello there!. I also checked sample input against the EBCDIC table.

I also generated the lookup table using this. But the same result.

  • Am I doing anything wrong? Or is the lookup table wrong?
  • Is there anyother way to validate for invalid chars without converting to ASCII?

Sample code is below...

#include <stdio.h>

static const unsigned char e2a[256] = {
          0,  1,  2,  3,156,  9,134,127,151,141,142, 11, 12, 13, 14, 15,
         16, 17, 18, 19,157,133,  8,135, 24, 25,146,143, 28, 29, 30, 31,
        128,129,130,131,132, 10, 23, 27,136,137,138,139,140,  5,  6,  7,
        144,145, 22,147,148,149,150,  4,152,153,154,155, 20, 21,158, 26,
         32,160,161,162,163,164,165,166,167,168, 91, 46, 60, 40, 43, 33,
         38,169,170,171,172,173,174,175,176,177, 93, 36, 42, 41, 59, 94,
         45, 47,178,179,180,181,182,183,184,185,124, 44, 37, 95, 62, 63,
        186,187,188,189,190,191,192,193,194, 96, 58, 35, 64, 39, 61, 34,
        195, 97, 98, 99,100,101,102,103,104,105,196,197,198,199,200,201,
        202,106,107,108,109,110,111,112,113,114,203,204,205,206,207,208,
        209,126,115,116,117,118,119,120,121,122,210,211,212,213,214,215,
        216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,
        123, 65, 66, 67, 68, 69, 70, 71, 72, 73,232,233,234,235,236,237,
        125, 74, 75, 76, 77, 78, 79, 80, 81, 82,238,239,240,241,242,243,
         92,159, 83, 84, 85, 86, 87, 88, 89, 90,244,245,246,247,248,249,
         48, 49, 50, 51, 52, 53, 54, 55, 56, 57,250,251,252,253,254,255
};

void ebcdicToAscii (unsigned char *s)
{
    while (*s)
    {
        *s = e2a[(int) (*s)];
        s++;
    }
}

int main (void) {
    unsigned char str[] = "\xc8\x85\x93\x93\x96\x40\xa3\x88\x85\x99\x85\x5a";
    ebcdicToAscii (str);
    printf ("%s\n", str);
    return 0;
}

Thanks in advance.

2

There are 2 answers

0
Ton Plooij On

Your lookup table is wrong. It converts EBCDIC value 0x5A ('!') to (decimal) 93. ASCII decimal 93 is an ']'. So, your application works fine, it outputs the ']' character. You indicate that you generated the lookup table from a python sample that uses cp500 which is IBM code page 500. This code page indeed maps EBCDIC value 0x5A to the ']' character. If you would use the character set listed here for your lookup table, things would be ok.

0
Hogstrom On

There is a utility called īconv in USS on z/OS that will do the conversion for you. A full reference can be found here of the code pages it supports.

That said, should you choose to roll your own here are a few suggestions. First, EBCDIC to ASCII has a few subtle things to consider. For EBCDIC there are a variety of code pages that are in use for z/OS systems. Here is a link that provide more detail. In general EBCDIC is Code Page 1047 which works for most North America users. However, there are other code pages which would impact what you are doing. For instance, in the UK they would commonly use Code Page 37. This means there are subtle differences in character translation.

So, you would need one translation table for each code page your translating from to properly convert special characters that are unique to the code pages.

Also, this is pure preference but I prefer seeing the code in hex rather than decimal for this kind of data (translation). Its often how people will refer to characters on Z

static const unsigned char e2a[256] = {
      0x00, 0x01, 0x02, 0x03, 0x9c, 0x09, 0x86, 0x7f, 0x97, 0x8d, 0x8e, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
      0x10, 0x11, 0x12, 0x13, 0x9d, 0x85, 0x08, 0x87, 0x18, 0x19, 0x92, 0x8f, 0x1c, 0x1d, 0x1e, 0x1f,
      0x80, 0x81, 0x82, 0x83, 0x84, 0x0a, 0x17, 0x1b, 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x05, 0x06, 0x07,
      0x90, 0x91, 0x16, 0x93, 0x94, 0x95, 0x96, 0x04, 0x98, 0x99, 0x9a, 0x9b, 0x14, 0x15, 0x9e, 0x1a,
      0x20, 0xa0, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7, 0xa8, 0x5b, 0x2e, 0x3c, 0x28, 0x2b, 0x21,
      0x26, 0xa9, 0xaa, 0xab, 0xac, 0xad, 0xae, 0xaf, 0xb0, 0xb1, 0x5d, 0x24, 0x2a, 0x29, 0x3b, 0x5e,
      0x2d, 0x2f, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7, 0xb8, 0xb9, 0x7c, 0x2c, 0x25, 0x5f, 0x3e, 0x3f,
      0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf, 0xc0, 0xc1, 0xc2, 0x60, 0x3a, 0x23, 0x40, 0x27, 0x3d, 0x22,
      0xc3, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66, 0x67, 0x68, 0x69, 0xc4, 0xc5, 0xc6, 0xc7, 0xc8, 0xc9,
      0xca, 0x6a, 0x6b, 0x6c, 0x6d, 0x6e, 0x6f, 0x70, 0x71, 0x72, 0xcb, 0xcc, 0xcd, 0xce, 0xcf, 0xd0,
      0xd1, 0x7e, 0x73, 0x74, 0x75, 0x76, 0x77, 0x78, 0x79, 0x7a, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7,
      0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0xdf, 0xe0, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7,
      0x7b, 0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47, 0x48, 0x49, 0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed,
      0x7d, 0x4a, 0x4b, 0x4c, 0x4d, 0x4e, 0x4f, 0x50, 0x51, 0x52, 0xee, 0xef, 0xf0, 0xf1, 0xf2, 0xf3,
      0x5c, 0x9f, 0x53, 0x54, 0x55, 0x56, 0x57, 0x58, 0x59, 0x5a, 0xf4, 0xf5, 0xf6, 0xf7, 0xf8, 0xf9,
      0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff
 };