Unicode characters aren't combined properly

629 views Asked by At

I am working with some Devanagari text data I want to display in the browser. Unfortunately, there's one combination of nonspacing combining characters that doesn't get rendered as a proberly combined character.

The problem occurs every time a base character is combined with the Devanagari Stress Sign Udatta ॑ (U+0951) and the Devanagari Sign Visarga ः (U+0903).

An example for this would be र॑ः, which is र (U+0930) + + and should be rendered as one character. But the stress sign and the other one don't seem to like each other (as you can see above!).
It's no problem to combine the base char with each of the other two signs alone, btw: र॑ / रः

I already tried to use several fonts which should be able to render Devanagari characters (some Noto fonts, Siddhanta, GentiumPlus) and tested it with different browsers, but the problem seems to be something else.

Does anyone have an idea? Is this not a valid combination of symbols?

EDIT: I just tried to switch around the two marks just to see what if - it renders as रः॑, so U+0951 and U+0903 don't seem to have the same function, as the stress sign gets rendered on top of the other mark.
It looks like i don't understand Unicode enough, yet.

1

There are 1 answers

1
skomisa On

This is NOT a solution for your problem, but might be useful information:

I am working with some Devanagari text data I want to display in the browser.

Like you, I couldn't get this to work in any browser despite trying several fonts, including Arial Unicode MS:

browserDevanagari

The browser was simply rendering the text Devanagari Test: &#x0930;&#x0903;&#x0951; from within the <body> of a JSP. The stress sign is clearly appearing above the Sign Visarga instead of the base character.

Is this not a valid combination of symbols?

It is a valid combination. I don't know Devanagari, so I don't know whether it is semantically "valid", but it is trivial to generate exactly the character you want from a Java application:

System.out.println("Devanagari test: \u0930\u0903\u0951");

This is the output from executing the println() call, showing the stress sign above the base character:

devanagara1

The screenshot above is from NetBeans 8.2 on Windows 10, but the rendering also worked fine using the latest releases of Eclipse and Intellij IDEA. The constraints are:

  • The three characters must be specified in that order in println() for the rendering to work.
  • The Sign Visarga and the Stress Sign Udatta must be presented in their Unicode form. Pasting their glyph representations into the source code won't work, although this can be done for the base character.
  • An appropriate font must be used for the display. I used Arial Unicode MS for the screen shot above, but other fonts such as Serif, SansSerif and Monospaced also worked.

Does anyone have an idea?

Unfortunately not, although it is clear that:

  • The grapheme you want to render exists, and is valid.
  • Although it won't render in a browser, it can be written to the console by a Java application.
  • The problem seems to be that all browsers apply the diacritic (Stress Sign Udatta) to the immediately preceding character rather than the base character.

See Why are some combining diacritics shifted to the right in some programs? for more information on this.