I'm wondering how can I convert ISO-8859-2 (latin-2) characters (I mean integer or hex values that represents ISO-8859-2 encoded characters) to UTF-8 characters.
What I need to do with my project in python:
- Receive hex values from serial port, which are characters encoded in ISO-8859-2.
- Decode them, this is - get "standard" python unicode strings from them.
- Prepare and write xml file.
Using Python 3.4.3
txt_str = "ąęłóźć"
txt_str.decode('ISO-8859-2')
Traceback (most recent call last): File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'decode'
The main problem is still to prepare valid input for the "decode" method (it works in python 2.7.10, and thats the one I'm using in this project). How to prepare valid string from decimal value, which are Latin-2 code numbers?
Note that it would be uber complicated to receive utf-8 characters from serial port, thanks to devices I'm using and communication protocol limitations.
Sample data, on request:
68632057
62206A75
7A647261
B364206F
20616775
777A616E
616A2061
6A65696B
617A20B6
697A7970
6A65B361
70697020
77F36469
62202C79
6E647572
75206A65
7963696C
72656D75
6A616E20
73726F67
206A657A
65647572
77207972
73772065
00000069
This is some sample data. ISO-8859-2 pushed into uint32, 4 chars per int.
bit of code that manages unboxing:
l = l[7:].replace(",", "").replace(".", "").replace("\n","").replace("\r","") # crop string from uart, only data left
vl = [l[0:2], l[2:4], l[4:6], l[6:8]] # list of bytes
vl = vl[::-1] # reverse them - now in actual order
To get integer value out of hex string I can simply use:
int_vals = [int(hs, 16) for hs in vl]
This topic is closed. Working code, that handles what need to be done: