Encoding problem on MySQL: Why some non-ASCII characters get encoded on more than 4 bytes?

38 views Asked by At

I've encountered a problem using MySQL on Docker. When I directly insert non-ASCII characters in the database using the initialization sql script, the characters are correctly shown on MySQL's console, but their encodings are wrong.

I coded a MySQL container with a minimal sql script to reproduce the problem.

Here's the structure of my directory:

.
├── docker-compose.yml
└── my-sql
    ├── Dockerfile
    └── ddl
        └── mySQL.sql

docker-compose.yml

version: '3.8'
services: 
 mysql:
     build:
       context: ./my-sql/
       dockerfile: Dockerfile
     container_name: mysql
     expose:
       - 3306
     ports:
       - 3306:3306
     environment:
       MYSQL_ROOT_PASSWORD: test
       MYSQL_USER: test
       MYSQL_PASSWORD: test
       MYSQL_DATABASE: test
     volumes:
       - "mysql:/var/lib/mysql"

volumes:
 mysql:

Dockerfile

FROM mysql:8.2.0

USER 999:999

COPY ./ddl /docker-entrypoint-initdb.d/

mySQL.sql

CREATE TABLE IF NOT EXISTS `test`(
    `test` VARCHAR(100)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

INSERT INTO `test` (`test`) VALUES ("");
INSERT INTO `test` (`test`) VALUES ("平");

ALTER DATABASE test CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; # Does not work

When I use MySQL using the console and that I select the test column of the test table, I get this:

mysql> SELECT `test` FROM `test`;
+------+
| test |
+------+
|  |
| 平  |
+------+
mysql> SELECT HEX(`test`) FROM `test`;
+------------------+
| HEX(`test`)      |
+------------------+
| C3B0C5B8C2A4C2AE |
| C3A5C2B9C2B3     |
+------------------+

I did some research to find the correct encoding of these characters in various encodings and I didn't see these encodings, and as far as my knowledges go, maximum size for UTF-8 character is 4 bytes. I also noticed that the "|" alignment of what MySQL prints is wrong (and proportionally wronger to the size of the character hexadecimal encoding).

I looked at the hexadecimal encoding of the .sql script using VScode and, at least, the emoji is correctly encoded (F0 9F A4 AE).

I also tried another MySQL version (8.0.36), but it still doesn't work.

Thanks in advance

0

There are 0 answers