I have two Javascript strings that when passed into console.log give the exact same output from console.log but when I pass them through an HTTP POST to my Java Jetty based REST API using SpringMVC, the Jackson JSON parsing library throws an error and makes the server return a 400 status code for the second but not the first.
Both strings are generated using the textAngular tool in a browser where, when clicking the toggle html/rich text button to edit html, the only difference between the two strings is that before being sanitized by textAngular, one contains an " " in the html and the other contains a regular space " ":
<p>Howdy </p>
vs
<p>Howdy </p>
where the second one is the one that fails. However each string before it is passed to the server is sanitized by textAngular so the output looks the same. Both sanitized versions of the string's output from console.log are:
<p>Howdy </p>
so they ought to be treated the same: either both throw an error or both pass.
Is console.log doing its own sanitizing on an in the actual string variable? Or is there a different character encoding used for the space character in the two strings that console.log is automatically converting from?
- How could I validate my hunch that the output strings from textAngular are using different character encodings for the space character?
- How might I go about solving my actual problem and ensure output strings share the same character encoding if that is the problem.
Update: I tried using the following on the strings and:
console.log(html.charCodeAt(8))
returns 32 for the first string and 160 for the second string so it appears their binary representations are in fact different. I suppose I could write my own sanitizer that converts this particular space encoding to the other, however I'm worried there may be other edge cases like this with different characters. I'm wondering if there is a better solution that forces the entire string into the right character encoding.
My solution was to add my own sanitizing function before submitting the post:
where '\xa0' is the hex encoding for 160 (non breaking space). This seems to satisfy the server's JSON parser; although I'd be interested to see if there are related character encoding problems with textAngular's sanitizer and a potentially more comprehensive solution.