Javascript: how to convert text copied into textarea from any encoding to UTF-8?

1.1k views Asked by At

I have a simple html+js page with a <textarea>.
The user is supposed to paste some text inside it, and the page will use the pasted text as a GET parameter for a remote service url which expects UTF-8.

My current code is like this:

<body>
  <textarea id="editor"></textarea>
</body>
<script>
  var content = document.getElementById('editor').innerHTML;
  content = stripTags(content);
  content = decodeHTML(content);
  content = encodeURIComponent(content);
  var url = remoteServiceBaseUrl + content;
  window.open(url, '_blank');

  function stripTags(input) {
    return input.replace(/<(.|\n)*?>/g, '');
  }

  function decodeHtml(html) {
    var txt = document.createElement("textarea");
    txt.innerHTML = html;
    return txt.value;
  }
<script>

But it has trouble with - for example - %0C (form-feed) character found in some Windows-1252 texts: when calling the url I am thrown the error URIError: malformed URI sequence.

So the question is: how do I convert text to UTF-8, indipendently from the source encoding, with javascript?

0

There are 0 answers