ICQ encoding of Special Characters

645 views Asked by At

I'm working with ICQ protocol and I found problem with special letters (fxp diacritics). I read that ICQ using another encoding (CP-1251 if I remember).

How can I decode string with text to correct encoding?

I've tried using UTF8Encoding class, but without success.

Using ICQ-sharp library.

    private void ParseMessage (string uin, byte[] data)
    {
        ushort capabilities_length = LittleEndianBitConverter.Big.ToUInt16 (data, 2);
        ushort msg_tlv_length = LittleEndianBitConverter.Big.ToUInt16 (data, 6 + capabilities_length);
        string message = Encoding.UTF8.GetString (data, 12 + capabilities_length, msg_tlv_length - 4);

        Debug.WriteLine(message);
    }

If contact using the same client it's OK, but if not incoming and outcoming messages with diacritics are just unreadable.

I've determinated (using this -> https://stackoverflow.com/a/12853721/846232) that it's in BigEndianUnicode encoding. But if string not contains diacritics its unreadable (chinese letters). But if I use UTF8 encoding on text without diacritics its ok. But I don't know how to do that it will be encoded right allways.

1

There are 1 answers

9
johv On

If UTF-8 kinda works (i.e. it works for "english", or any US-ASCII characters), then you don't have UTF-16. Latin1 (or Windows-1252, Microsoft's variant), or e.g. Windows-1251 or Windows-1250 are perfectly possible though, since these the first part containing latin letters without diacritics are the same.

Decode like this:

var encoding = Encoding.GetEncoding("Windows-1250");
string message = encoding.GetString(data, 12 + capabilities_length, msg_tlv_length - 4);