Adding a UCA Collation to a Unicode Character Set, why it is doesn't work?

133 views Asked by At

In Unicode Locale Data Markup Language(LDML), since version 24, the element and its sub-elements is deprecated. But the MySQL example still uses deprecated element.

The collation defined when I added to MySQL Collation with a latest version of the CLDR Collation definition marked with the element did not take effect.

I want to add to MySQL collation for the UTF8 character using stroke collation in <zh.xml>.

MySQL Path: mysql-8.0.28-winx64\share\charsets\index.xml

MySQL version: 8.0.28

Stroke collation in: https://github.com/unicode-org/cldr/blob/main/common/collation/zh.xml

http://www.unicode.org/reports/tr35/#Element_rules

https://dev.mysql.com/doc/mysql-g11n-excerpt/8.0/en/ldml-rules.html

How to repeat

Step 1. Edit mysql-8.0.28-winx64\share\charsets\index.html Add some element(collation content copy from CLDR collation zh.xml) like:

<charset name="utf8mb4">
  <family>Unicode</family>
  <description>UTF-8 Unicode</description>
        <collation name="utf8mb4_stroke_ci" id="1030" type='stroke'>
            <cr><![CDATA[
        [import zh-u-co-private-pinyin]
...more data...
           ]]></cr>
        </collation>
</charset>

Step 2. Restart mysql server

Step 3. Check collation added success

mysql> SHOW COLLATION WHERE Collation = 'utf8mb4_stroke_ci';
+----------------+---------+------+---------+----------+---------+---------------+
| Collation      | Charset | Id   | Default | Compiled | Sortlen | Pad_attribute |
+----------------+---------+------+---------+----------+---------+---------------+
| utf8mb4_stroke_ci | utf8    | 1030 |         |          |       8 | PAD SPACE     |
+----------------+---------+------+---------+----------+---------+---------------+
1 row in set (0.00 sec)

Step 4. Create a database and table then insert some data

mysql> create database collation_test;
Query OK, 1 row affected (0.02 sec)

mysql> use collation_test;
Database changed

mysql> SET NAMES utf8mb4 COLLATE utf8mb4_stroke_ci;
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE member_stroke (
    ->     name VARCHAR(64) CHARACTER SET utf8mb4 COLLATE utf8mb4_stroke_ci
    -> );
Query OK, 0 rows affected (0.05 sec)

mysql> insert into member_stroke values('一'); -- character '一' means '1', stroke 1.
Query OK, 1 row affected (0.01 sec)

mysql> insert into member_stroke values('二'); -- character '一' means '2', stroke 2.
Query OK, 1 row affected (0.01 sec)

mysql> insert into member_stroke values('三'); -- character '一' means '3', stroke 3.
Query OK, 1 row affected (0.01 sec)

Step 4. Select data and order by name

mysql> select * from member_stroke order by name;
+------+
| name |
+------+
| 一   |
| 三   |
| 二   |
+------+
3 rows in set (0.00 sec)

Expect result

+------+
| name |
+------+
| 一   |
| 二   |
| 三   |
+------+

Additional information
When I use the element to define collation, it success! But is`s deprecated at LDML(version 24) on 2013-09-18.

<charset name="utf8mb4">
  <family>Unicode</family>
  <description>UTF-8 Unicode</description>
        <collation name="utf8mb4_stroke_ci" id="1030" type='stroke' alt='short'>
            <rules>
                <!-- START AUTOGENERATED STROKE SHORT -->
                <reset><last_non_ignorable /></reset>
               <p>﷐⠁</p><!-- INDEX 1 -->
               <pc>一</pc><!-- 1 -->
               <p>﷐⠁</p><!-- INDEX 2 -->
               <pc>二</pc><!-- 2 -->
               <p>﷐⠁</p><!-- INDEX 3 -->
               <pc>三</pc><!-- 3 -->
           </rules>
        </collation>

</charset>
mysql> select * from member_stroke order by name;
+------+
| name |
+------+
| 一   |
| 二   |
| 三   |
+------+
3 rows in set (0.00 sec)
0

There are 0 answers