Python retrieved data from database which is encoded by Latin1_swedish_ci in MySQL

789 views Asked by At

I have a database storing text with Latin1_swedish_ci. If I retrieved the data from this database by php (encoding in utf-8), I got the correct result

Here is the result:

measured at Dungsha coraisland (20°42’N, 116°43’E) during the South China Sea Monsoon Experiment (May-June 1998) have been calibrated and compared with radiative transfer calculations for three clear-sky days.

However, when I use python with mysql connector, I got the wrong result!

Here is the result:

Downward total solar fluxes measured at Dungsha coraisland (20¢X42¡¦N, 116¢X43¡¦E) during the South China Sea Monsoon Experiment (May-June 1998) have been calibrated and compared with radiative transfer calculations for three clear-sky days.

Currently,part of my codes look like this:

cnx = mysql.connector.connect(host ='localhost',user='root', database='tao',charset='utf8',use_unicode='true')
f = io.open("upload.xml",'w',encoding='utf-8')
f.write(row[dic['abs']]+"\n")

Can someone help me? I need to retrieve all the data from the database by Python and output them into the xml file

Moreover, I am wondering why I can successfully get the correct result (20°42’N, 116°43’E) by php's echo? I have already check the data by using mysql> select..., and the result is also like this 20¢X42¡¦N, 116¢X43¡¦E.

1

There are 1 answers

4
Rick James On

Yikes! I think you went through both latin1 and big5 character sets.
3230A2583432A1A64E is hex for 20¢X42¡¦N in CHARACETER SET latin1
3230A2583432A1A64E is hex for 20°42’N in CHARACETER SET big5
3230C2B03432E280994E is hex for 20°42’N in CHARACETER SET utf8, which (I guess) is what you wanted stored.

It is enough of a nightmare to deal with both latin1 and utf8. But throwing big5 into the mix makes my head spin.

The connection to mysql is willing to convert from whatever encoding you have into whatever you stated for the table/column. But you must state it correctly.

I think you had big5 data coming in, but you claimed it was latin1.