How to highlight prefix match in SQLite FTS5

1.1k views Asked by At

I have a SQLite FTS5 virtual table and am trying to highlight text in my prefix query results. I am aware of the highlight() and snippet() auxiliary functions, however they don't seem to support exactly what I am trying to do. If my data looks like:

fts.my_data
-----------
John
Mike
Bill
Jane

and I want to query using a prefix match such as

select * from fts where fts match 'j*';

The highlight and snippet functions (assuming <b>...</b> tags) will return

<b>John</b>
<b>Jane</b>

But I only want to highlight the exact part of the prefix that was matched, before the wildcard:

<b>J</b>ohn
<b>J</b>ane

There does not seem to be any way to do this using the existing FTS5 auxiliary functions. I realize FTS5 offers an API so that you can create your own auxiliary functions. I might also be able to implement the solution in application code (I am using Swift), although I suspect this problem has the potential for a lot of issues trying to implement in application code (for example, how to handle stemming). Does anyone know if what I am trying to do is actually possible using the existing highlight and/or snippet functions before I go to the trouble of implementing my own solution? If so, could you explain how?

Also, I have observed several other existing apps (Contacts+ for example) offer this capability so I know it is possible somehow, and am also wondering how they do it if anyone knows how.

1

There are 1 answers

1
John Cleveland On

For anyone else looking for a resolution to this issue, I was able to figure out how to solve this issue.

The trigram tokenizer was added as one of the built-in tokenizers in SQLite 3.34.0 (released December 1, 2020).

In my case, I was deploying to iOS, which at this current time, only has SQLite 3.32 bundled with it by default, as confirmed by this wiki page. So I was able to download the SQLite source code and add it as a project to my XCode workspace as a "Static Library" and reference the resulting .a static library file from within my app's project. I also had to set the appropriate C Flags in the XCode compiler options for the SQLite project to get everything to work correctly. But I am now able to distribute my own supplied version of SQLite (3.34.1), compiled with my specific options, with FTS5 and the trigram tokenizer.