How is text direction for right-to-left languages, like Arabic, encoded in PDF? My understanding is that since PDF is fundamentally a graphical format, the concept of text-direction doesn't need to really be encoded. Rather, the glyphs simply need to be painted on-screen from right to left. However, the PDF reference manual mentions an attribute called WritingMode
, where you can specify combinations left-to-right, right-to-left and top-to-bottom, bottom-to-top.
So my questions is:
(1) If my understanding is correct, and RTL or LTR is merely expressed by the way the glyphs are painted on-screen, what is the point of the WritingMode
attribute?
(2) If there is no actual directionality information encoded in the PDF file, other than the order the glyphs are painted, how does a PDF-to-Text program know if a given line is supposed to be read right-to-left or left-to-right? (I suppose the PDF program could just check if the Unicode codepoints extracted from a ToUnicode
map fall into a range that corresponds to an RTL language.)
Text direction will be set in the Trm