Search results using ts_vectors and ST_Distance

82 views Asked by At

We have a page where we show a list of results, and the results must be relevant given 2 factors:

  1. keyword similarity
  2. location

we are using postgresql postgis and ts_vectors, however, we don't know how to combine the scores coming of ts vectors and st_distance in order to have the "best" search results, the queries seem to be taking between 30 seconds and 1 minute.

SELECT                                                                                                                                                                                                                [121/1808]
ts_rank_cd(doc_vectors, plainto_tsquery('Uber '), 1 | 4 | 32) AS rank, ts_headline('english', short_job_description, plainto_tsquery('Uber '), 'MaxWords=80,MinWords=50'),     

-- a bunch of fields omitted...                                                                                                                                                                                            
org.logo                                                                                                                                                                                                                       
FROM jobs.job as job                                                                                                                                                                                                           
LEFT OUTER JOIN jobs.organization as org                                                                                                                                                                                       
ON job.organization_id = org.id
WHERE job.is_expired = 0 and deleted_at is NULL and doc_vectors @@ plainto_tsquery('Uber ') order by rank desc offset 80 limit 20;

Do you guys have suggestions for us?

EXPLAIN (ANALYZE, BUFFERS) for same Query:

----------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=886908.73..886908.81 rows=30 width=1108) (actual time=20684.508..20684.518 rows=30 loops=1)
   Buffers: shared hit=1584 read=825114
   ->  Sort  (cost=886908.68..889709.48 rows=1120318 width=1108) (actual time=20684.502..20684.509 rows=50 loops=1)
         Sort Key: job.created_at DESC
         Sort Method: top-N heapsort  Memory: 75kB
         Buffers: shared hit=1584 read=825114
         ->  Hash Left Join  (cost=421.17..849692.52 rows=1120318 width=1108) (actual time=7.012..18887.816 rows=1111019 loops=1)
               Hash Cond: (job.organization_id = org.id)
               Buffers: shared hit=1581 read=825114
               ->  Seq Scan on job  (cost=0.00..846329.53 rows=1120318 width=1001) (actual time=0.052..17866.594 rows=1111019 loops=1)
                     Filter: ((deleted_at IS NULL) AND (is_expired = 0) AND (is_hidden = 0))
                     Rows Removed by Filter: 196298
                     Buffers: shared hit=1564 read=824989
               ->  Hash  (cost=264.41..264.41 rows=12541 width=107) (actual time=6.898..6.899 rows=12541 loops=1)
                     Buckets: 16384  Batches: 1  Memory Usage: 1037kB
                     Buffers: shared hit=14 read=125
                     ->  Seq Scan on organization org  (cost=0.00..264.41 rows=12541 width=107) (actual time=0.021..3.860 rows=12541 loops=1)
                           Buffers: shared hit=14 read=125
 Planning time: 2.223 ms
 Execution time: 20684.682 ms```
0

There are 0 answers