Ways to simulate the yet unimplemented <bdi> HTML tag?

1k views Asked by At

The purpose of the <bdi> tag in HTML5 is to isolate bidirectional text from it's context. And that's precisely what I'm looking for.

A left-to-right username displays like this:

Welcome, Generic User. [Logout]

With a right-to-left username it would turn into this awful thing:

Welcome, [tougoL] .resU cireneG

or even worse depending on the context, displaying everything around (not just the users' name) in backwards.

The problem is that no browsers support the <bdi> tag yet, so I was wandering, is there a way to simulate it? What HTML tags could isolate it aswell? I know <span> and <div> do not.

I wouldn't like to remove all BIDI characters, but the way I see it, the importance of my site to display properly > the right for bidirectional-language users to participate.

2

There are 2 answers

2
bobince On BEST ANSWER

With a right-to-left username it would turn into this awful thing

It shouldn't. The text in an (eg) Arabic username would be rendered right-to-left but it wouldn't affect the flow of Latin text around it.

The problem you may be thinking of is when a username includes a Unicode BDO (bi-directional override) control character. This affects all inline text following it, which is often a Bad Thing for web sites templating text into HTML.

Probably the simplest solution to this problem is input filtering to remove control characters, both the normal ASCII ones (0x00–0x1F) and the Unicode ones. There is a group of characters designated by Unicode and W3 as unsuitable for use in markup in this Note which web applications will generally want to remove from data. It includes the BDO characters and several others that can cause odd effects to leak outside of their own stretch of text.

1
Jukka K. Korpela On

The string in the question does not trigger wrong display order, unless there are control characters in the username string, but e.g. a message of the form

User (N badges) wrote:

would do that, if User were replaced by a name in Arabic letters, say أحمد, and N were replaced by a number, say 3. The rendering would then be

أحمد (3 badges) said:

Technically, this is not a bug; it follows from Unicode bidirectionality rules – the strong right-to-left (RTL) directionality of Arabic letters affects characters with weak directionality like parentheses. But it is all wrong in practical terms, of course. Any string that may contain RTL characters in a generally left-to-right context should be protected, isolated. In HTML documents, there are three ways to do that:

  • Character level: use the control characters U+202B (right-to-left embedding, RLE) before and U+202C (pop directional formatting, PDF) after the string. In HTML, you could use &#x202b; and &#x202c; for them. This is supported by IE 9 but not by most other browsers.
  • Markup level: use the <bdi> markup. As mentioned, it is not supported by browsers yet.
  • Stylesheet: use unicode-bidi: embed. This is generally supported by modern browsers.

You can combine the stylesheet approach with the markup approach. It’s logical to do so, and in future browsers, this double approach will work even with stylesheets disabled:

<script>
document.createElement('bdi');
</script>
<style>
bdi { unicode-bidi: bidi-override; }
</style>
...
<bdi>أحمد</bdi> (3 badges) wrote:

The script code is there to make older versions of IE recognize the <bdi> element, so that styles will take effect on it. This would of course fail when scripting is disabled, so it would be slightly safer to use <span> with class, and you could still wrap it inside <bdi>. So an alternative is

<style>
.bdi { unicode-bidi: bidi-override; }
</style>
...
<bdi><span class=bdi>أحمد</span></bdi> (3 badges) wrote: