Can't we make a better variable-length character encoding with just using the 1 bit extra in the 7 bit ASCII?

46 views Asked by At

What if we used the extra bit as a flag? If the flag is set (1), it indicates that the character continues into the next byte. If not (0), it’s the end of the character.

Where UTF-8 uses

Byte 1 Byte 2 Byte 3 Byte 4
0xxxxxxx
110xxxxx 10xxxxxx
1110xxxx 10xxxxxx 10xxxxxx
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

This uses

Byte 1 Byte 2 Byte 3 Byte 4
0xxxxxxx
1xxxxxxx 0xxxxxxx
1xxxxxxx 1xxxxxxx 0xxxxxxx
1xxxxxxx 1xxxxxxx 1xxxxxxx 0xxxxxxx

Is it speed because you know the size from the first byte in UTF-8? Is this a trade for size vs speed?

I tried looking for answers by looking into the early history of Unicode. I looked into the original Unicode 88 paper and I couldn't find a definitive answer

0

There are 0 answers