We have a page where we show a list of results, and the results must be relevant given 2 factors:
- keyword similarity
- location
we are using postgresql postgis and ts_vectors, however, we don't know how to combine the scores coming of ts vectors and st_distance in order to have the "best" search results, the queries seem to be taking between 30 seconds and 1 minute.
SELECT [121/1808]
ts_rank_cd(doc_vectors, plainto_tsquery('Uber '), 1 | 4 | 32) AS rank, ts_headline('english', short_job_description, plainto_tsquery('Uber '), 'MaxWords=80,MinWords=50'),
-- a bunch of fields omitted...
org.logo
FROM jobs.job as job
LEFT OUTER JOIN jobs.organization as org
ON job.organization_id = org.id
WHERE job.is_expired = 0 and deleted_at is NULL and doc_vectors @@ plainto_tsquery('Uber ') order by rank desc offset 80 limit 20;
Do you guys have suggestions for us?
EXPLAIN (ANALYZE, BUFFERS) for same Query:
----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=886908.73..886908.81 rows=30 width=1108) (actual time=20684.508..20684.518 rows=30 loops=1)
Buffers: shared hit=1584 read=825114
-> Sort (cost=886908.68..889709.48 rows=1120318 width=1108) (actual time=20684.502..20684.509 rows=50 loops=1)
Sort Key: job.created_at DESC
Sort Method: top-N heapsort Memory: 75kB
Buffers: shared hit=1584 read=825114
-> Hash Left Join (cost=421.17..849692.52 rows=1120318 width=1108) (actual time=7.012..18887.816 rows=1111019 loops=1)
Hash Cond: (job.organization_id = org.id)
Buffers: shared hit=1581 read=825114
-> Seq Scan on job (cost=0.00..846329.53 rows=1120318 width=1001) (actual time=0.052..17866.594 rows=1111019 loops=1)
Filter: ((deleted_at IS NULL) AND (is_expired = 0) AND (is_hidden = 0))
Rows Removed by Filter: 196298
Buffers: shared hit=1564 read=824989
-> Hash (cost=264.41..264.41 rows=12541 width=107) (actual time=6.898..6.899 rows=12541 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 1037kB
Buffers: shared hit=14 read=125
-> Seq Scan on organization org (cost=0.00..264.41 rows=12541 width=107) (actual time=0.021..3.860 rows=12541 loops=1)
Buffers: shared hit=14 read=125
Planning time: 2.223 ms
Execution time: 20684.682 ms```