GIN index not used for small table when 0 rows returned

233 views Asked by At

In a Postgres 9.4 database, I created a GIN trigram index on a table called 'persons' that contains 1514 rows like the following:

CREATE INDEX persons_index_name_1 ON persons
USING gin (lower(name) gin_trgm_ops);

and a query that looks for similar names as follows:

select name, last_name from persons where lower(name) % 'thename'

So, I first issued a query with a name I knew beforehand that would have similar matches, so the explain analyze showed that the index I created was used in this query:

select name, last_name from persons where lower(name) % 'george'

And the results were the expected:

  ->  Bitmap Heap Scan on persons  (cost=52.01..58.72 rows=2 width=26) (actual time=0.054..0.065 rows=1 loops=1)
        Recheck Cond: (lower((name)::text) % 'george'::text)
        Rows Removed by Index Recheck: 2
        Heap Blocks: exact=1
        ->  Bitmap Index Scan on persons_index_name_1  (cost=0.00..52.01 rows=2 width=0) (actual time=0.032..0.032 rows=3 loops=1)
              Index Cond: (lower((name)::text) % 'george'::text)
...
Execution time: 1.382 ms"

So, out of curiosity, I wanted to see if the index was used when the thename parameter contained a name that didn't exist at all in the table:

select name, last_name from persons where lower(name) % 'noname'

But I saw that in this case that the index was not used at all and the execution time was way higher:

  ->  Seq Scan on persons  (cost=0.00..63.72 rows=2 width=26) (actual time=6.494..6.494 rows=0 loops=1)
        Filter: (lower((name)::text) % 'noname'::text)
        Rows Removed by Filter: 1514
...
Execution time: 7.387 ms

As a test, I tried the same with a GIST index and in both cases, the index was used and the execution time was like the first case above.

I went ahead and recreated the table but this time inserting 10014 rows; and I saw that in both cases above, the GIN index was used and the execution time was the best for those cases.

Why is a GIN index is not used when the query above returns no results in a table with not so much rows (1514 in my case)?

1

There are 1 answers

1
Erwin Brandstetter On BEST ANSWER

Trigram indexes are case insensitive, test with:

select 'case' <-> 'CASE' AS ci1
     , 'case' %   'CASE' AS ci2
     , 'CASE' <-> 'CASE' AS c1
     , 'CASE' %   'CASE' AS c2;

So you might as well just:

CREATE INDEX persons_index_name_1 ON persons USING gin (name gin_trgm_ops);

And:

select name, last_name from persons where name % 'thename';

As to your actual question, for small tables an index look-up might not pay. That's exactly what your added tests demonstrate. And establishing that nothing matches can be more expensive than finding some matches.

Aside from that, your cost setting and / or table statistics may not be at their respective optimum to let Postgres pick the most adequate query plans.

The expected cost numbers translate to much higher actual cost for the sequential scan than for the bitmap index scan. You may be overestimating the cost of index scans as compared to sequential scans. random_page_cost (and cpu_index_tuple_cost) may be set too high and effective_cache_size too low.