My tracks table with ~3mln records (grows 500/day) has around 30 columns but I only use 15 in WHERE clause. Query takes 4800ms on average with no other users/processes using database. How to make it faster? I would like to see results closer to 100ms.
People looking for songs (tracks) fill out a form :
- string -> for "Song title or artist name"
- string -> for "Genre"
- date -> for "Released after"
- several integers for min/max for 12 remaining parameters
99% of use cases is that SELECT query:
SELECT
"public"."tracks"."sys_id",
"public"."tracks"."all_artists",
"public"."tracks"."name",
"public"."tracks"."genres",
"public"."tracks"."release_date",
"public"."tracks"."tempo",
"public"."tracks"."popularity",
"public"."tracks"."danceability",
"public"."tracks"."energy",
"public"."tracks"."speechiness",
"public"."tracks"."acousticness",
"public"."tracks"."instrumentalness",
"public"."tracks"."liveness",
"public"."tracks"."valence",
"public"."tracks"."main_artist_popularity",
"public"."tracks"."main_artist_followers",
"public"."tracks"."key",
"public"."tracks"."preview_url"
FROM
"public"."tracks"
WHERE
(
"public"."tracks"."name" LIKE '%oultec%'
OR "public"."tracks"."all_artists_string" LIKE '%oultec%'
)
AND ("public"."tracks"."genres_string" LIKE '%rum%')
AND "public"."tracks"."tempo" >= '80'
AND "public"."tracks"."tempo" <= '210'
AND "public"."tracks"."popularity" >= '0'
AND "public"."tracks"."popularity" <= '100'
AND "public"."tracks"."main_artist_popularity" >= '1'
AND "public"."tracks"."main_artist_popularity" <= '100'
AND "public"."tracks"."main_artist_followers" >= '1'
AND "public"."tracks"."main_artist_followers" <= '50000000'
AND "public"."tracks"."danceability" >= '0'
AND "public"."tracks"."danceability" <= '1000'
AND "public"."tracks"."energy" >= '0'
AND "public"."tracks"."energy" <= '1000'
AND "public"."tracks"."speechiness" >= '0'
AND "public"."tracks"."speechiness" <= '1000'
AND "public"."tracks"."acousticness" >= '0'
AND "public"."tracks"."acousticness" <= '1000'
AND "public"."tracks"."instrumentalness" >= '0'
AND "public"."tracks"."instrumentalness" <= '1000'
AND "public"."tracks"."liveness" >= '0'
AND "public"."tracks"."liveness" <= '1000'
AND "public"."tracks"."valence" >= '0'
AND "public"."tracks"."valence" <= '1000'
AND "public"."tracks"."release_date" >= '2020-01-01'
AND "public"."tracks"."key" = '10'
ORDER BY
"public"."tracks"."release_date" DESC,
"public"."tracks"."popularity" DESC,
"public"."tracks"."sys_id" ASC
LIMIT 5 OFFSET 0;
Indexes (indices):
PRIMARY sys_id
UNIQUE main_artist, name, duration_ms
INDEX energy
INDEX tempo, popularity, main_artist_popularity, main_artist_followers, danceability, energy, speechiness, acousticness, instrumentalness, liveness, valence, name, all_artists_string, genres_string, release_date, key
EXPLAIN/ANALYZE :
Limit (cost=308411.32..308411.33 rows=1 width=279) (actual time=4582.756..4584.010 rows=0 loops=1)
-> Sort (cost=308411.32..308411.33 rows=1 width=279) (actual time=4546.306..4547.559 rows=0 loops=1)
Sort Key: release_date DESC, popularity DESC, sys_id
Sort Method: quicksort Memory: 25kB
-> Gather (cost=1000.00..308411.31 rows=1 width=279) (actual time=4546.121..4547.374 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on tracks (cost=0.00..307411.21 rows=1 width=279) (actual time=4473.923..4473.926 rows=0 loops=3)
Filter: ((genres_string ~~ '%rum%'::text) AND (tempo >= '80'::smallint) AND (tempo <= '210'::smallint) AND (popularity >= '0'::smallint) AND (popularity <= '100'::smallint) AND (main_artist_popularity >= '1'::smallint) AND (main_artist_popularity <= '100'::smallint) AND (main_artist_followers >= 1) AND (main_artist_followers <= 50000000) AND (danceability >= '0'::smallint) AND (danceability <= '1000'::smallint) AND (energy >= '0'::smallint) AND (energy <= '1000'::smallint) AND (speechiness >= '0'::smallint) AND (speechiness <= '1000'::smallint) AND (acousticness >= '0'::smallint) AND (acousticness <= '1000'::smallint) AND (instrumentalness >= '0'::smallint) AND (instrumentalness <= '1000'::smallint) AND (liveness >= '0'::smallint) AND (liveness <= '1000'::smallint) AND (valence >= '0'::smallint) AND (valence <= '1000'::smallint) AND (release_date >= '2020-01-01'::date) AND (key = '10'::smallint) AND ((name ~~ '%oultec%'::text) OR (all_artists_string ~~ '%oultec%'::text)))
Rows Removed by Filter: 1034268
Planning Time: 5.145 ms
JIT:
Functions: 14
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 18.941 ms, Inlining 0.000 ms, Optimization 8.644 ms, Emission 110.864 ms, Total 138.449 ms
Execution Time: 4667.446 ms
PostgreSQL running from "official" image: postgres:14.1-alpine:
- Ubuntu-20.04-x86_64
- 2 CPU
- 2GB RAM
- 20GB SSD Drive
Table structure:
Website that runs the queries (through an API/backend, more fields with min/max integers, but not shown here):



Your query does
LIKE '%something%'fulltext searches and range scans on dates and numbers. But BTREE indexes, the default, can handle just one range scan. And, they can't handleLIKE '%something%'at all. So, you are getting a full table scan for every query. 4.8s for three megarows isn't too bad considering.For your
column LIKE '%something%'searches you could try using trigram indexes, a feature of postgreSQL. This code will create trigram indexes onname. That might narrow down the choices and make it so you have to scan less data.But you'll still have to scan through all the matching tracks.
If you create these indexes and then refactor the first bits of your WHERE clause to use set computations like so, you may (or may not) get even better performance.
But the fact remains that SQL isn't great for all these range scans.