Best way to insert Unicode data to the DB with latin1_general_ci

497 views Asked by At

I receive data from XML file in Unicode. What is the best and correct way to insert/update this data in MySQL DB with fields in latin1_general_ci encoding?

Thanks!

1

There are 1 answers

0
O. Jones On

Nitpick: latin1_general_ci is a collation -- a sorting order. The encoding -- the CHARACTER SET -- you are using is latin1.

Entitize your Unicode characters from your strings. Do this after you parse your XML file into values and before you stash those values in your database columns. For example, you'll want to turn ⇨ (an arrow) into ⇨ in your text string before storing it.

$outstr = htmlentities ( $instr, ENT_NOQUOTES, ISO8859-1);

You need to read up on htmlentities because it has lots of options. http://php.net/manual/en/function.htmlentities.php

When you retrieve those values from the database, you can either send them directly to a browser, which understands the entitized items, or you can use html_entity_decode() to undo the entitizing operation.