Encrypting a string that contains unicode results in unrecognized characters

1.3k views Asked by At

I'm trying to encrypt a string in C#:

static public string Encrypt(char[] a)
{
    for (int i = 0; i < a.Length; i++)
    {
        a[i] -= (char)(i + 1);
        if (a[i] < '!')
        {
            a[i] += (char)(i + 20);
        }
    }
    return new string(a);
}

Now, when I put in this string:

"Qui habite dans un ananas sous la mer?".

The encryption comes out as:

`Psf3c[[ak[3XT`d3d\3MYKWIZ3XSXU3L@?JAMR`

There's an unrecognizable character in there, after the @. I don't know how it got there, and I don't know why.

If I try to decrypt it (using this method:)

static public string Decrypt(char[] a)
{
    for (int i = 0; i < a.Length; i++)
    {
        a[i] += (char)(i + 1);
        if ((a[i] - 20) - i <= '!')
        {
           a[i] -= (char)(i + 20);
        }
    }
    return new string(a);
}

This is the (incorrect) output:

Qui habite dans un ananas sous laamerx.

How do I allow the encryption routine to access unicode characters?

3

There are 3 answers

0
Jon Hanna On

Generally with modern encryption we don't pay attention to the characters (we may not even have any, we might be encrypting a picture or a sound file), we pay attention to the bytes.

You could take the same approach. Get a stream of bytes from the text in a particular encoding (UTF-8 would be a good one) and then do your encryption on that.

The encrypted bytes are then your output. If you need to have something that can be written down you could use base-64 to produce a textual representation.

The encryption still won't be very good, because that's the hard part and for real uses we'd use an established and well-tested encryption scheme, but you'll have a viable approach that won't produce illegal Unicode sequences like non-characters or mis-matched surrogates.

1
BufferOverflow On

That's a pretty week encryption, your problem is that the encryption algorithm output's ASCII values that is not possible to print out in a viewable format.

A solution is to encode the data in some way, either print them out as a list of decimals with separators or use some sort of encodings algorithms like base64 or radix64.

Just a tip, most of the modern encryption algorithms uses XOR operator to encrypt the data. I wrote a easy xor cipher with CBC chaning mode to you, just to point out this is far way from a secure algorithm, but it's much more secure than your project.

public char [ ] encryptCBC ( char [ ] plain, char [ ] password, char [ ] iv )
{
    char [ ] ciphertext = new char [ 8 ];

    for ( int i = 0; i < 8; i ++ )
    {
            ciphertext [ i ] = plain ^ iv;
            ciphertext [ i ] ^= password;
    }

    return ciphertext;
}

public char [ ] decryptCBC ( char [ ] ciphertext, char [ ] password, char [ ] iv )
{
    char [ ] plaintext = new char [ 8 ];

    for ( int i = 0; i < 8; i ++ )
    {
            plaintext [ i ] = ciphertext ^ password;
            plaintext [ i ] ^= iv;
    }

    return plaintext;
}

This is a block-cipher that means it encrypt a block (n-bytes) for every loop, in this example it encrypts 8-bytes. So the iv (Initialize Vector - Random data) need's to be 8-bytes long, the password needs also to be 8-bytes long. And the text you are encrypted must be splitet up in blocks of 8-bytes. And then loop the function until all data is encrypted, example if you have 32-byte of data that needs to be encrypted, then it will take 4 loops to complete the encryption.

EDIT: Forgot to tell you that you input random data as iv for the first loop you do, and then input the result of the previous loop as iv for the next loop and so on.

2
The Vermilion Wizard On

The reason you're getting an unprintable character is this line:

a[i] -= (char)(i + 1);

What's happening is your space inside la mer is the 34th position of the string, and the equivalent integer value of a space is 0x20 = 32. This means when you subtract (i+1) you're getting -2. But you're storing the result in a char, which is an unsigned type so this actually becomes 0xFFFE = 65534. Then when you test a[i] < '!' you get false, because a[i] is now a large positive number.

Instead what you should do (if you really want to implement this algorithm) is store the result in a signed type, and manipulate it as you're doing and then convert it to a char at the end.

    int value = (int)a[i] - (i + 1);
    if (value < (int)'!')
    {
        value += i + 20;
    }
    a[i] = (char)value;

(Extra type casts for emphasis.)

It may not be necessary, but I'd recommend using the same pattern in the Decrypt method as well. It's generally much easier to reason about code which works on a temporary variable rather than editing things in place.