I am trying to map a list of characters to another one and later converting the string accordingly. In order to do this, I need to add domain and range for converting purpose.

I wrote a function called "charConvert" and it checks the input character if it is one of the predefined characters and if so, alters the value it has with the target value. It does that via an exhausting number of ifs and I want to do this with arrays (or more efficient ways).

char charConvert (char b){
    if (b == 'a'){
        b = 'b';
        return b;
    }
    if (b == 'b'){
        b = 'c';
        return b;
    }
    if (b == 'c'){
        b = 'd';
        return b;
    }
    //....list goes like that
}

Is there any more clever solution for such a mapping problem?

3 Answers

0
Alex Reynolds On

If all your characters are single byte char, then you can generically use a 256-element static unsigned char [] to map one character to another.

This approach is used, for instance, for C functions that take the reverse complement of a sequence of DNA. The following is a table that maps nucleotides to their IUPAC complement:

static unsigned char seq_comp_table[256] = {
      0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,  14,  15,
     16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,  27,  28,  29,  30,  31,
     32,  33,  34,  35,  36,  37,  38,  39,  40,  41,  42,  43,  44,  45,  46,  47,
     48,  49,  50,  51,  52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,
     64, 'T', 'V', 'G', 'H', 'E', 'F', 'C', 'D', 'I', 'J', 'M', 'L', 'K', 'N', 'O',
    'P', 'Q', 'Y', 'S', 'A', 'A', 'B', 'W', 'X', 'R', 'Z',  91,  92,  93,  94,  95,
     64, 't', 'v', 'g', 'h', 'e', 'f', 'c', 'd', 'i', 'j', 'm', 'l', 'k', 'n', 'o',
    'p', 'q', 'y', 's', 'a', 'a', 'b', 'w', 'x', 'r', 'z', 123, 124, 125, 126, 127,
    128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143,
    144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159,
    160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175,
    176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191,
    192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207,
    208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223,
    224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
    240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255
};

To get the base (or "letter") that is complementary to the base A, you would call this:

unsigned char complementary_base = seq_comp_table[(uint8_t)'A']; /* T */

If you're working with multibyte characters, you might use std::map<wchar,wchar>.

2
Acorn On

Yes, indexing into an array of results is the fastest you can get. It reduces to basically a single instruction that reads the result from memory in an architecture like x86.

In particular, if you don't need all characters, as is usually the case, you can offset by the starting one, as shown here:

char charConvert(char b)
{
    static constexpr char mapping[] = {
        'x', // for 'a'
        'y', // for 'b'
        'z', // for 'c'
    };

    // Assuming ASCII (fine in most environments, but not guaranteed)
    return mapping[b - 'a'];
}

It is a good idea to add out-of-bounds-checking, at least in debug mode.

0
William Torkington On

For simplicity:

// Array of all possible letters:
int encode = [62];

// Adding to array:
encode['A'-'A'] = 'q';
encode['f'-'A'] = 'g';
// etc

// Pulling from array:
printf("%c\n", encode['f'-'A']);