I have a fulltext indexed field with charset utf8mb4 on MySQL 8.0
I need to be able to search for queries like "km/h" or "A-B" but with the current charset definition, slash and dash are defined as punctuation characters and are therefore not indexed by the fulltext index.
Looking at the MySQL documentation, it seems that I have the possibilities, (given that utf8 is a strong requirement for me):
- UCA-based collations for Unicode character sets using LDML syntax
- Introduce a new character set and recompile MySQL
I would rather avoid recompilation, as it would introduce complexity into maintenance.
I was not able to work out the unicode documentation to create the collation rules. Could anybody help me to define a specific collation in LDML, in order to consider dash and slash as regular characters ?
Suggest you alter the data in the database, plus the data in queries to avoid punctuation.
One way is to blindly change punctuation to
_
. Another is to change "km/h" to "km_per_h" and "A-B" to "A_dash_B". Either way, you will get much closer to allowing punctuation in FT indexing and searching.