Why does vacuum analyze change query plan while analyze does not?

1.1k views Asked by At

I wanted to utilize the power of index-only scans in Postgres and experimented with one table:

CREATE TABLE dest.contexts
(
  id integer NOT NULL,
  phrase_id integer NOT NULL,
  lang character varying(5) NOT NULL,
  ranking_value double precision,
  index_min integer,
  index_max integer,
  case_sensitive boolean,
  is_enabled boolean,
  is_to_sync boolean NOT NULL DEFAULT true
);

insert into dest.contexts select * from source.contexts;

alter table dest.contexts
  add constraint pk_contexts primary key (id, phrase_id, lang);

 CREATE INDEX idx_contexts_
  ON dest.contexts
  USING btree
  (id, is_enabled, lang, phrase_id, ranking_value, index_min, index_max, case_sensitive);

The index covers all columns I want to use in the next query:

explain analyze
select ranking_value, index_min, index_max, case_sensitive
from dest.contexts
where id = 456 and is_enabled

I check the plan immediately after creation:

Bitmap Heap Scan on contexts  (cost=4.41..31.46 rows=12 width=17) (actual time=0.045..0.045 rows=0 loops=1)
  Recheck Cond: (id = 456)
  Filter: is_enabled
  ->  Bitmap Index Scan on idx_contexts_  (cost=0.00..4.40 rows=12 width=0) (actual time=0.038..0.038 rows=0 loops=1)
        Index Cond: ((id = 456) AND (is_enabled = true))
Planning time: 0.631 ms
Execution time: 0.093 ms

It is strange, but OK...

In several seconds it changes to another (autovacuum?)

Index Scan using pk_contexts on contexts  (cost=0.28..17.93 rows=6 width=17) (actual time=0.027..0.027 rows=0 loops=1)
  Index Cond: (id = 456)
  Filter: is_enabled
Planning time: 0.185 ms
Execution time: 0.070 ms

I try to force it to use index-only scan:

analyze dest.contexts

But it does not change anything. Then I do

vacuum verbose analyze dest.contexts;

INFO:  vacuuming "dest.contexts"
INFO:  index "pk_contexts" now contains 4845 row versions in 21 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "idx_contexts_" now contains 4845 row versions in 37 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "contexts": found 0 removable, 4845 nonremovable row versions in 41 out of 41 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  analyzing "dest.contexts"
INFO:  "contexts": scanned 41 of 41 pages, containing 4845 live rows and 0 dead rows; 4845 rows in sample, 4845 estimated total rows

And here I finally get what I want:

Index Only Scan using idx_contexts_ on contexts  (cost=0.28..4.40 rows=6 width=17) (actual time=0.014..0.014 rows=0 loops=1)
  Index Cond: ((id = 456) AND (is_enabled = true))
  Filter: is_enabled
  Heap Fetches: 0
Planning time: 0.247 ms
Execution time: 0.052 ms

So here are the questions:

  1. Why does not analyze teach it to use the large 'all-covering' index?
  2. Why does vacuum analyze do it?
  3. My table was filled from the scratch with one large insert. Why does vacuum do anything at all? To my mind there is nothing to vacuum there.
1

There are 1 answers

0
samed.yildirim On

Analyze is responsible for collecting statistics, and analyze have effects on selection of query plan and plan costs.

Indexes do not contain visibility information about the tuples. Because of that, reading data pages and filtering on resultset is also applied event though all columns in select is in the index. If all tuples visible in a page, Postgres does not need additional filtering operations. Postgres uses visibility map (vm) to achieve that.

Vacuum uses and also updates visibility map while reclaiming dead tuples. Because of that, vacuum changes query plan for using index-only scan if possible.

https://www.postgresql.org/docs/9.6/static/sql-vacuum.html

https://www.postgresql.org/docs/9.6/static/storage-vm.html