How to make query faster?

86 views Asked by At

My tracks table with ~3mln records (grows 500/day) has around 30 columns but I only use 15 in WHERE clause. Query takes 4800ms on average with no other users/processes using database. How to make it faster? I would like to see results closer to 100ms.

People looking for songs (tracks) fill out a form :

  • string -> for "Song title or artist name"
  • string -> for "Genre"
  • date -> for "Released after"
  • several integers for min/max for 12 remaining parameters

99% of use cases is that SELECT query:

SELECT
    "public"."tracks"."sys_id",
    "public"."tracks"."all_artists",
    "public"."tracks"."name",
    "public"."tracks"."genres",
    "public"."tracks"."release_date",
    "public"."tracks"."tempo",
    "public"."tracks"."popularity",
    "public"."tracks"."danceability",
    "public"."tracks"."energy",
    "public"."tracks"."speechiness",
    "public"."tracks"."acousticness",
    "public"."tracks"."instrumentalness",
    "public"."tracks"."liveness",
    "public"."tracks"."valence",
    "public"."tracks"."main_artist_popularity",
    "public"."tracks"."main_artist_followers",
    "public"."tracks"."key",
    "public"."tracks"."preview_url"
FROM
    "public"."tracks"
WHERE
    (
    "public"."tracks"."name" LIKE '%oultec%'
    OR "public"."tracks"."all_artists_string" LIKE '%oultec%'
    )
    AND ("public"."tracks"."genres_string" LIKE '%rum%')
    AND "public"."tracks"."tempo" >= '80'
    AND "public"."tracks"."tempo" <= '210'
    AND "public"."tracks"."popularity" >= '0'
    AND "public"."tracks"."popularity" <= '100'
    AND "public"."tracks"."main_artist_popularity" >= '1'
    AND "public"."tracks"."main_artist_popularity" <= '100'
    AND "public"."tracks"."main_artist_followers" >= '1'
    AND "public"."tracks"."main_artist_followers" <= '50000000'
    AND "public"."tracks"."danceability" >= '0'
    AND "public"."tracks"."danceability" <= '1000'
    AND "public"."tracks"."energy" >= '0'
    AND "public"."tracks"."energy" <= '1000'
    AND "public"."tracks"."speechiness" >= '0'
    AND "public"."tracks"."speechiness" <= '1000'
    AND "public"."tracks"."acousticness" >= '0'
    AND "public"."tracks"."acousticness" <= '1000'
    AND "public"."tracks"."instrumentalness" >= '0'
    AND "public"."tracks"."instrumentalness" <= '1000'
    AND "public"."tracks"."liveness" >= '0'
    AND "public"."tracks"."liveness" <= '1000'
    AND "public"."tracks"."valence" >= '0'
    AND "public"."tracks"."valence" <= '1000'
    AND "public"."tracks"."release_date" >= '2020-01-01'
    AND "public"."tracks"."key" = '10'
ORDER BY
    "public"."tracks"."release_date" DESC,
    "public"."tracks"."popularity" DESC,
    "public"."tracks"."sys_id" ASC
LIMIT 5 OFFSET 0;

Indexes (indices):

PRIMARY sys_id
UNIQUE  main_artist, name, duration_ms
INDEX   energy
INDEX   tempo, popularity, main_artist_popularity, main_artist_followers, danceability, energy, speechiness, acousticness, instrumentalness, liveness, valence, name, all_artists_string, genres_string, release_date, key

EXPLAIN/ANALYZE :

Limit  (cost=308411.32..308411.33 rows=1 width=279) (actual time=4582.756..4584.010 rows=0 loops=1)
  ->  Sort  (cost=308411.32..308411.33 rows=1 width=279) (actual time=4546.306..4547.559 rows=0 loops=1)
        Sort Key: release_date DESC, popularity DESC, sys_id
        Sort Method: quicksort  Memory: 25kB
        ->  Gather  (cost=1000.00..308411.31 rows=1 width=279) (actual time=4546.121..4547.374 rows=0 loops=1)
              Workers Planned: 2
              Workers Launched: 2
              ->  Parallel Seq Scan on tracks  (cost=0.00..307411.21 rows=1 width=279) (actual time=4473.923..4473.926 rows=0 loops=3)
                    Filter: ((genres_string ~~ '%rum%'::text) AND (tempo >= '80'::smallint) AND (tempo <= '210'::smallint) AND (popularity >= '0'::smallint) AND (popularity <= '100'::smallint) AND (main_artist_popularity >= '1'::smallint) AND (main_artist_popularity <= '100'::smallint) AND (main_artist_followers >= 1) AND (main_artist_followers <= 50000000) AND (danceability >= '0'::smallint) AND (danceability <= '1000'::smallint) AND (energy >= '0'::smallint) AND (energy <= '1000'::smallint) AND (speechiness >= '0'::smallint) AND (speechiness <= '1000'::smallint) AND (acousticness >= '0'::smallint) AND (acousticness <= '1000'::smallint) AND (instrumentalness >= '0'::smallint) AND (instrumentalness <= '1000'::smallint) AND (liveness >= '0'::smallint) AND (liveness <= '1000'::smallint) AND (valence >= '0'::smallint) AND (valence <= '1000'::smallint) AND (release_date >= '2020-01-01'::date) AND (key = '10'::smallint) AND ((name ~~ '%oultec%'::text) OR (all_artists_string ~~ '%oultec%'::text)))
                    Rows Removed by Filter: 1034268
Planning Time: 5.145 ms
JIT:
  Functions: 14
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 18.941 ms, Inlining 0.000 ms, Optimization 8.644 ms, Emission 110.864 ms, Total 138.449 ms
Execution Time: 4667.446 ms

PostgreSQL running from "official" image: postgres:14.1-alpine:

  • Ubuntu-20.04-x86_64
  • 2 CPU
  • 2GB RAM
  • 20GB SSD Drive

system htop

Table structure:

tracks table structure

Website that runs the queries (through an API/backend, more fields with min/max integers, but not shown here):

Form view for running the query

1

There are 1 answers

2
O. Jones On

Your query does LIKE '%something%' fulltext searches and range scans on dates and numbers. But BTREE indexes, the default, can handle just one range scan. And, they can't handle LIKE '%something%' at all. So, you are getting a full table scan for every query. 4.8s for three megarows isn't too bad considering.

For your column LIKE '%something%' searches you could try using trigram indexes, a feature of postgreSQL. This code will create trigram indexes on name. That might narrow down the choices and make it so you have to scan less data.

CREATE EXTENSION pg_trgm;  -- you may or may not need this statement.
CREATE INDEX CONCURRENTLY tracks_name
     ON tracks
  USING GIN (name gin_trgm_ops);
CREATE INDEX CONCURRENTLY tracks_all_artists_string
     ON tracks
  USING GIN (all_artists_string gin_trgm_ops);
CREATE INDEX CONCURRENTLY tracks_genres_string
     ON tracks
  USING GIN (genres_string gin_trgm_ops);

But you'll still have to scan through all the matching tracks.

If you create these indexes and then refactor the first bits of your WHERE clause to use set computations like so, you may (or may not) get even better performance.

WHERE sys_id IN (
     (SELECT sys_id FROM tracks WHERE name LIKE '%oultec%'
       UNION 
      SELECT sys_id FROM tracks WHERE all_artists_string LIKE '%oultec%'
     )
     INTERSECT
     SELECT sys_id FROM tracks WHERE genres_string LIKE '%oultec%'
    )
  AND tempo >= '80' ... 

But the fact remains that SQL isn't great for all these range scans.