Why is my java String shorter in length than the byte[] array it was generated from?

307 views Asked by At

I'm reading a blob from a MySql database using JDBC. I know the resulting byte array is good, I've sent it over HTTP as a string literal of numbers for each byte, and successfully downloaded the result (jpg). (just to prove mysql -> java servlet data is good).

Constructing a new string from this byte array using UTF-8 yields a string shorter in length than the byte array, and of values I can't decipher. If UTF-8 is AT LEAST 1 byte per character, shouldn't the resulting string be AT A MINIMUM the length of the byte array its generated from? (for this particular example, byte length is 12,079,474 and resulting string length is 11,501,845)

Thanks for your time!

1

There are 1 answers

0
Marc Balmer On

In your bytes, you have data that is interpreted as continuation bytes, i.e. in UTF-8 they have special meaning and they form one Unicode character from multiple bytes. That is why your string is shorter than the number of bytes.