Strings encoded ASCII and UTF8 have different lengths!

Question

Strings encoded ASCII and UTF8 have different lengths!

7k views Asked by Hedge At 13 August 2024 at 17:20

I'm reading a stream and am wondering why the UTF-8 encoded string is shorter than the ASCII one.

  ASCIIEncoding encoder = new ASCIIEncoding();
  UTF8Encoding enc = new UTF8Encoding();   
  string response = encoder.GetString(message, 0, bytesRead); //4096
  string responseUtf8 = enc.GetString(message, 0, bytesRead);  //3955

Original Q&A

There are 4 answers

Martin Törnwall On 08 October 2010 at 22:03

Perhaps the message contained some characters that couldn't be encoded as a single byte in UTF-8.

Adonais On 08 October 2010 at 22:03

UTF-8 handles different the strings than ASCII: On UTF-8, each character may be of 1, 2 or 3 bytes length. However, ASCII considers each byte as a character. The C# UTF-8 encoder counts well-formed UTF-8 characters, instead of bytes. I hope this helps you.

Timwi On 08 October 2010 at 22:04

Because when decoding bytes, ASCIIEncoding replaces all bytes greater than 127 (0x7F) with a question mark (?) which is one character, while UTF8Encoding decodes UTF-8 multi-byte sequences correctly into single characters (for example, the three bytes 232,170,158 become the single character 語).

**Guffa** · Accepted Answer · 2010-10-08 22:10:33

That's because the stream is actually UTF-8 encoded. If it was ASCII encoded, the strings would be identical.

When read as ASCII, the byte combinations that represent characters outside the 0-127 code set will be read as separate characters, and they will look like garbage.

When read as UTF-8, the byte combinations will be decoded into the correct characters, each multi-byte combination ending up as a single character.

(Note: Strings are not encoded, it's the stream that is encoded. You decode the stream from ASCII or UTF-8 into a Unicode character string.)

TechQA.

Strings encoded ASCII and UTF8 have different lengths!

There are 4 answers

Related Questions in C#

Related Questions in .NET

Related Questions in ENCODING

Popular Questions

Popular Tags

Trending Questions