How to use a GIST or GIN index with an hstore column in Postgresql?

3.1k views Asked by At

I'm playing around with postgresql 9.3's hstore. I'm trying to use and index for an hstore column just like documentation states. MY problem is that the index appear not to be used. Let me give you an example:

I created a table 'Person':

=# CREATE TABLE Person (Id BIGSERIAL PRIMARY KEY NOT NULL, Values hstore);

And inserted a test value:

=# INSERT INTO Person (Values, 'a=>1,b=>3');

Then if I EXPLAIN a SELECT query which uses operator "@>" on 'Values' column, I unsurprisingly get:

=# EXPLAIN SELECT P.* FROM Person AS P WHERE P.Values @> hstore('a', '1');
                        QUERY PLAN                        
----------------------------------------------------------
 Seq Scan on person p  (cost=0.00..24.50 rows=1 width=40)
   Filter: ("values" @> '"a"=>"1"'::hstore)

No index < - > sequential scan. Makes sense. Anyway, it doesn't matter if I create a GIN or GIST index, the explain keeps talking about sequential scan:

=# CREATE INDEX IX_GIN_VALUES ON Person USING GIN (values);
CREATE INDEX

=# EXPLAIN SELECT P.* FROM Person P WHERE P.values @> hstore('a', '1');

                        QUERY PLAN                        
----------------------------------------------------------
 Seq Scan on person p  (cost=0.00..1.01 rows=1 width=246)
   Filter: ("values" @> '"age"=>"2"'::hstore)

Maybe I'm missing something obvious?

2

There are 2 answers

4
Denis de Bernardy On

If you're just playing with it, be sure to add enough data for the index scan to make sense. If you only have a few rows, or if many row contains similar values (i.e. your where criteria isn't selective enough), a seq scan will usually be faster than an index scan.

Also, be sure to analyze your table after filling it with your test data.


Some extra reading for @maxm:

(Performance has greatly improved since the latter was written.)

Why isn't his/her index being used?

Because it's faster for Postgres to seq scan the entire table (which has a single row) and filter the row out of a single disk page, than it is to do the index look up, and then seq scan the table likewise in order to retrieve the row's data.

Is there an issue w/ how the asker is creating their index?

None, but see the above link on when it's better to use normalized data.

And prefer json or jsonb rather than hstore.

Querying the hstore column? What needs to be fixed so that the SELECT query uses such an index?

Nothing, but again see the above link on when it's better to use normalized data.

0
solidsnack On

In a nutshell: when there are few pages in a table, Postgres's planner prefers to skip the indexes and just load and scan the rows.

CREATE SCHEMA stackoverflow20589058;
--- CREATE SCHEMA

SET search_path TO stackoverflow20589058,"$user",public;
--- SET

CREATE EXTENSION hstore;
--- CREATE EXTENSION

CREATE TABLE Person (Id BIGSERIAL PRIMARY KEY NOT NULL, Values hstore);
--- CREATE TABLE

WITH Vals(n) AS (SELECT * FROM generate_series(1,10))
INSERT INTO Person (
  SELECT n AS Id, hstore('a=>'||n||', b=>'||n) AS Values FROM Vals
);
--- INSERT 0 10

EXPLAIN SELECT P.* FROM Person AS P WHERE P.Values @> hstore('a', '1');
---                         QUERY PLAN                        
--- ----------------------------------------------------------
---  Seq Scan on person p  (cost=0.00..24.50 rows=1 width=40)
---    Filter: ("values" @> '"a"=>"1"'::hstore)
--- (2 rows)

CREATE INDEX IX_GIN_VALUES ON Person USING GIN (values);
--- CREATE INDEX

------------------------- When there are few values, a sequential scan is
------------------------- often the best search strategy. Grabbing a few
------------------------- pages in sequence can be cheaper than making an
------------------------- extra disk seek to load the index.
EXPLAIN SELECT P.* FROM Person AS P WHERE P.Values @> hstore('a', '1');
---                        QUERY PLAN                        
--- ---------------------------------------------------------
---  Seq Scan on person p  (cost=0.00..1.12 rows=1 width=40)
---    Filter: ("values" @> '"a"=>"1"'::hstore)
--- (2 rows)

TRUNCATE Person;
--- TRUNCATE TABLE

WITH Vals(n) AS (SELECT * FROM generate_series(1,100000))
INSERT INTO Person (
  SELECT n AS Id, hstore('a=>'||n||', b=>'||n) AS Values FROM Vals
);
--- INSERT 0 100000

------------------------- When there are many rows, using the index can
------------------------- allow us to skip quite a lot of I/O; so
------------------------- Postgres's planner makes use of the index.
EXPLAIN SELECT P.* FROM Person AS P WHERE P.Values @> hstore('a', '1');
---                                    QUERY PLAN                                   
--- --------------------------------------------------------------------------------
---  Bitmap Heap Scan on person p  (cost=916.83..1224.56 rows=107 width=40)
---    Recheck Cond: ("values" @> '"a"=>"1"'::hstore)
---    ->  Bitmap Index Scan on ix_gin_values  (cost=0.00..916.80 rows=107 width=0)
---          Index Cond: ("values" @> '"a"=>"1"'::hstore)
--- (4 rows)

DROP SCHEMA stackoverflow20589058 CASCADE;
--- NOTICE:  drop cascades to 2 other objects
--- DETAIL:  drop cascades to extension hstore
--- drop cascades to table person
--- DROP SCHEMA