I'm reading a blob from a MySql database using JDBC. I know the resulting byte array is good, I've sent it over HTTP as a string literal of numbers for each byte, and successfully downloaded the result (jpg). (just to prove mysql -> java servlet data is good).
Constructing a new string from this byte array using UTF-8 yields a string shorter in length than the byte array, and of values I can't decipher. If UTF-8 is AT LEAST 1 byte per character, shouldn't the resulting string be AT A MINIMUM the length of the byte array its generated from? (for this particular example, byte length is 12,079,474 and resulting string length is 11,501,845)
Thanks for your time!
In your bytes, you have data that is interpreted as continuation bytes, i.e. in UTF-8 they have special meaning and they form one Unicode character from multiple bytes. That is why your string is shorter than the number of bytes.