utf8mb4 encoding of csv files for import to MySQL

4.1k views Asked by At

I have been successful setting up my Ubuntu 16.04 LTS LAMPS server so that MySQL 5.7 now fully supports the utf8mb4 character sets. Asian and French character sets can be entered directly into the database front end XATAFace with no issues.

When I import csv files into MySQL with the phpmyadmin administrative frontend the csv files I am making now with french language characters are not being imported, throwing the familar mysql import error:

1366 Incorrect String value: ‘\xE9ciali…’ for column . . . .

I am guessing a text editor that can encode csv files as utf8mb4 is what is needed; please correct me if this is wrong. Is there a text editor on the Windows platform that would allow csv files to be encoded utf8mb4 so that I can successfully import my csvs into MySQL?

Any ideas how this csv import hurdle can be overcome?

Thanks in advance Mark, Saginaw Michigan

2

There are 2 answers

0
Martin On

utf8mb4 is simply UTF-8 by any other program.

UTF-8 by standard is upto 4-bytes per character (each byte is 8 bits), but for some reason MySQL UTF-8 is only upto 3-bytes per characters so can't show the full UTF-8 character set. Hence the mb4; when people complained to MySQL about this wyrd concept, they set UTF-8 multibyte 4 As the full UTF-8 character set.

There are various issues with Microsoft products not being able to correctly encode CSV files as UTF-8; and not being able to import UTF-8 CSV files correctly.

You can also read similar questions about how to get PHPMyAdmin to correctly import CSV files.

There are far too many possible reasons for your issue, but at the first start I would say use a program such as NotePad++ or OpenOffice (both free) which can work coherently with CSV files, then take your original data and save it as UTF-8 or another all encompassing characer set such as Windows-1252 .

This topic and this topic could also be a very useful eye opener to general UTF-8 and web issues.


Additionally, you might need to add a Byte Order Mark to your UTF-8 CSV file (this is the cause of the Microsoft Excel issue referenced above). This answer helps with this. You can also read here for more information on working with BOMs.

Good luck.

1
Rick James On

If that is typical of the data you have, then connect with latin1 instead of utf8. MySQL will convert to the CHARACTER SET defined for the columns. If you are using LOAD DATA (for importing a .csv file), there is a clause for establishing the source CHARACTER SET.