Why some software can display all characters and some not?

159 views Asked by At

Reference text: どうもありがとうございました

Copied to:

  • Notepad/Notepad++: displays it with no problems
  • LibreOffice Writer: it changes the font family to work, if you convert to Lucida Console, square boxes appear
  • Windows: displays it with no problems
  • Console: it needs the correct chcp and a font family (Lucida Console displays square boxes here too) which can display them if I am right

Is it possible to explain why Notepad can display any text in any font family and LibreOffice + Console cannot? Where is(are) the difference(s)? Is it possible to have the same behaviour on the console as the Notepad does for example?

1

There are 1 answers

6
Adrian McCarthy On BEST ANSWER

Some Windows fonts have glyphs for many different scripts, some cover a few scripts, and many cover just one. (Fonts which support many scripts are sometimes called "Unicode fonts," which can be a misleading term. In other OSes, these kinds of fonts are more prevalent. Windows itself doesn't ship with any, though I think you get one or two with the Office suite.)

When you try to output text in multiple scripts using standard Windows functions using one of the well-known fonts, then Windows uses font fallback and/or font linking, which automatically switches between fonts as needed to output the whole string. Most programs, like Notepad and Notepad++, thus get coverage automatically.

I haven't read the LibreOffice code, but I suspect that when you select a font for a span of text, it sticks with that font, effectively preventing Windows's font fallback and font linking mechanisms from helping. This isn't surprising, since a WYSIWYG editor is likely to use lower-level APIs for outputting text in order to have more typographic control. But using the lower-level APIs means you don't get fallback and linking for free, so you'd have to implement it yourself, and that's a lot of extra work that may not be important to very many users.

The Windows console has a lot of legacy and limitations that persist for backward compatibility with older programs. The console mostly emulates DOS systems, which didn't have any sort of Unicode support and instead relied on "Code Pages," which are, roughly speaking, alternate mappings between character values and glyphs. Code Pages are geared at just one (or maybe two) scripts, so if you need characters from another script, you were basically out of luck. I think modern versions of Windows have hacked in some support for a pseudo code page that supports UTF-8, but I've never gotten it to work well and it, too, has limitations.