suggest like google with postgresql trigrams and full text search

5.4k views Asked by At

I want to do a text search like google suggestions.

I'm using PostgreSQL because of the magical Postgis.

I was thinking on using FTS, but I saw that it could not search partial words, so I found this question, and saw how trigrams works.

The main problem is that the search engine I'm working on is for spanish language. FTS worked great with stemming and dictionaries (synonyms, misspells), UTF and so on. Trigrams worked great for partial words, but they only work for ASCII, and (obviously) they don't use things like dictionaries.

I was thinking if is there any way in which the best things from both could be used.

Is it possible make Full Text Search and Trigrams to work together in PGSQL?

2

There are 2 answers

2
Neil McGuigan On BEST ANSWER

You can do this in Postgres, and don't need Lucene.

You can quote phrases in tsquery or tsvector like the below. You can add a :* after a tsquery term to do a prefix search:

select
'''new york city'''::tsvector   @@ '''new yo'':*'::tsquery, --true
'''new york times'''::tsvector  @@ '''new yo'':*'::tsquery, --true
'''new york'''::tsvector        @@ '''new yo'':*'::tsquery, --true
'''new'''::tsvector             @@ '''new yo'':*'::tsquery, --false
'new'::tsvector                 @@ '''new yo'':*'::tsquery, --false
'new york'::tsvector            @@ '''new yo'':*'::tsquery  --false

The main problem is that to_tsvector() and [plain]to_tsquery() will strip your quotes. You can write your own versions that don't do this (it's not that hard), or do some post-processing after them to build your term n-grams.

The extra single quotes above are just escapes. select $$ i heart 'new york city' $$::tsvector; is equivalent.

2
aymeric On

I would recommend having a look at Lucene. It can be integrated natively in Java, easily in .NET or using SOLR and web services in php.

It has great capabilities for free text searches, term ranking out of the box, support for different languages using different Analysers (link for the spanish one).

Last but not least, it's also extremely fast (for big volumes, say 4Gb index ~ 5 000 000 rows in a DB, it's much faster than a Postgres database).