I created FTS5 virtual table in SQLite 3 and inserted many titles in it
CREATE VIRTUAL TABLE books_fts USING fts5(title, content='');
When I try to do Full Text Search with
SELECT rowid FROM books_fts WHERE title MATCH ?;
with parameter values 'усмi*' or 'ночiв*' I get no rows.
But with 'усм*' or 'ноч*' I get rowids for right corresponding tokens 'Усмiшки' or 'Ночiвля'.
Is it FTS5 bug in SQLite or I do something wrong?
Thanks in advance for the replies
Update:
sqlite3 fts.db
SQLite version 3.31.1 2020-01-27 19:55:54
Enter ".help" for usage hints.
sqlite> DROP TABLE IF EXISTS books_fts;
sqlite> CREATE VIRTUAL TABLE books_fts USING fts5(title, content='');
sqlite> INSERT INTO books_fts (rowid, title) VALUES (1, 'Усмiшки');
sqlite> INSERT INTO books_fts (rowid, title) VALUES (2, 'Ночiвля');
sqlite> SELECT rowid FROM books_fts WHERE title MATCH 'усм*';
1
sqlite> SELECT rowid FROM books_fts WHERE title MATCH 'усмi*';
1
sqlite> SELECT rowid FROM books_fts WHERE title MATCH 'ноч*';
2
sqlite> SELECT rowid FROM books_fts WHERE title MATCH 'ночiв*';
2
sqlite> .q
I think something wrong with encoding in my code. I'll try to find out
Update 2: As I managed to find out today, the problem is more complicated than I supposed.
The case is that in database UTF-8 letter "i" is Cyrillic 0xd196 / 0xd1 0x96 (2 bytes) but in match pattern "i" is Latin 0x69 (1 byte)
I have tried to tune tokenizer
CREATE VIRTUAL TABLE books_fts USING fts5(title, content='', tokenize = 'unicode61 remove_diacritics 2');
but it didn't help because it seems it works only for Latin diacritics :(
Has anyone encountered a similar problem and found the solution?
From all the above, I conclude that the program code and the SQLite3 work correctly as they should. And to perform a full text search the user need a keyboard with the desired native language layout on the phone for the correct input of national letters.
Thanks again everyone.