Dart sanitize international text

265 views Asked by At

How best do I sanitize text like

abc&#39; a>b<c & a<b>c

converting/displaying

abc&#39; a&gt;b&le;c &amp; a&le;b&gt;c

or in clear text

abc' a>b<c & a<b>c

so that I can use it via

myDiv.innerHtml=...   or
myDiv.setInnerHtml(..., myValidator, mySantitizer);

A text assignment myDiv.text=... converts all & and <> eliminating the valid apostrophe &#39; - the HtmlEscape.convert(..) class/method also converts all & in all HtmlEscapeMode's.

Could write my own Sanitizer, but hope that I overlooked some standard library/call.

2

There are 2 answers

0
Ticore Shih On BEST ANSWER

DartPad Link

RexExp for HTML Entity

import 'dart:html';
import 'dart:convert';

void main() {
  String htmlStr = r'abc&#39; a>b<c & a<b>' * 3;
  var reg = new RegExp(r"(.*?)(&#[1-9][0-9]{1,3}|[A-Za-z][0-9A-Za-z]+;)|(.*)");
  List<Match> matchs = reg.allMatches(htmlStr);
  var resStr = '';
  matchs.forEach((m) {
    var g1 = m.group(1);
    var g2 = m.group(2);
    var g3 = m.group(3);
    g1 = HTML_ESCAPE.convert(g1 == null ? '' : g1);
    g2 = g2 == null ? '' : g2;
    g3 = HTML_ESCAPE.convert(g3 == null ? '' : g3);
    resStr += g1 + g2 + g3;
  });
  print(resStr);
  document.body.setInnerHtml(resStr);
}
0
Jorg Janke On

After some thought, I realized that using Validators or HtmlEscape/Mode was not the best way to solve the problem.

The original problem was that translation engines use &#39; for the apostrophe - probably to not confuse it with the misuse of apostrophe as a single quote.

In summary, the best solution is to replace &#39; with the correct unicode character for the apostrophe, which is actually

The (correct) apostrophe U+0027 &#39; is misliked is as character fonts print it (incorrectly) straight - which graphic guys really hate - like the straight ".

With that, you can assign the translated text to element.text and if it contains problematic characters, they are escaped automatically by Dart - and rendered just fine.