python and Tkinter are processing Unicode characters correctly.
But they are not able to display Unicode encoded characters correctly.
I am using Python 3.1 and Tkinter in Ubuntu. I am trying to use Tamil Unicode characters.
All the processing is done correctly. But the display is wrong?
Here is the Wrong display as in Tkinter
Here is the Correct display (as in gedit)
Still not solved:
from tkinter import *
root = Tk()
root.geometry('200x200')
var = StringVar()
label = Label( root, textvariable=var, relief=RAISED )
Entry(text="Placeholder text").pack()
var.set("கற்றதனால் ஆய பயனென்கொல் வாலறிவன்\nநற்றாள்தொழாஅர் எனின். ")
label.pack()
root.mainloop()
It looks like Tk is mishandling things like 'Class Zero Combining Marks', see: http://www.unicode.org/versions/Unicode6.0.0/ch04.pdf#G124820 (Table 4-4)
I assume one of the sequences that do not show correctly are the codepoints: 0BA9 0BC6 (TAMIL SYLLABLE NNNE), where 0BC6 is a reordrant class zero combining mark according to the Unicode standard, which basically means the glyphs get swapped.
The only way to fix it is to file a bug at the Tk bug tracker and hope it gets fixed.