PostgreSQL slow query

438 views Asked by At

Having OpenVAS (backed by PostgreSQL) instance that is slow when opening 'Tasks' tab.

Following query runs 22 seconds in PostgreSQL. Any suggestions how this can be optimized?

SELECT id, host,
       iso_time (start_time), iso_time (end_time),
       current_port, max_port, report,
       (SELECT uuid FROM reports WHERE id = report),
       (SELECT uuid FROM hosts
        WHERE id = (SELECT host FROM host_identifiers
                    WHERE source_type = 'Report Host'
                      AND name = 'ip'
                      AND source_id = (SELECT uuid FROM reports
                                       WHERE id = report)
                      AND value = report_hosts.host
                    LIMIT 1)
       )
FROM report_hosts
WHERE report = 702;

The plan is

 Index Scan using report_hosts_by_report on report_hosts  (cost=0.42..1975570.99 rows=447 width=38) (actual time=50.042..22979.257 rows=1206 loops=1)
   Index Cond: (report = 702)
   SubPlan 1
     ->  Index Scan using reports_pkey on reports  (cost=0.28..2.49 rows=1 width=37) (actual time=0.004..0.004 rows=1 loops=1206)
           Index Cond: (id = report_hosts.report)
   SubPlan 4
     ->  Index Scan using hosts_pkey on hosts  (cost=4414.37..4416.59 rows=1 width=37) (actual time=0.001..0.001 rows=0 loops=1206)
           Index Cond: (id = $4)
           InitPlan 3 (returns $4)
             ->  Limit  (cost=2.49..4414.09 rows=1 width=4) (actual time=18.998..18.998 rows=0 loops=1206)
                   InitPlan 2 (returns $2)
                     ->  Index Scan using reports_pkey on reports reports_1  (cost=0.28..2.49 rows=1 width=37) (actual time=0.001..0.001 rows=1 loops=1206)
                           Index Cond: (id = report_hosts.report)
                   ->  Seq Scan on host_identifiers  (cost=0.00..4411.60 rows=1 width=4) (actual time=18.997..18.997 rows=0 loops=1206)
                         Filter: ((source_type = 'Report Host'::text) AND (name = 'ip'::text) AND (source_id = $2) AND (value = report_hosts.host))
                         Rows Removed by Filter: 99459
 Planning time: 0.531 ms
 Execution time: 22979.575 ms
2

There are 2 answers

0
Laurenz Albe On BEST ANSWER

All the time is spent in the 1206 sequential scans of host_identifiers.

Try to replace the subqueries with joins:

SELECT rh.id, rh.host,
       iso_time(rh.start_time), iso_time(rh.end_time),
       rh.current_port, rh.max_port, rh.report,
       r.uuid,
       h.uuid
FROM report_hosts AS rh
   LEFT JOIN reports AS r
      ON rh.report = r.id
   LEFT JOIN host_identifiers AS hi
      ON hi.source_id = r.uuid
         AND hi.value = rh.host
         AND hi.source_type = 'Report Host'
         AND hi.name = 'ip'
   LEFT JOIN hosts AS h
      ON h.id = hi.host
WHERE rh.report = 702;

This is not exactly the same because it does not account for the LIMIT 1 that makes little sense without an ORDER BY, but it should be close to the truth.

Proper indexes will make it fast (if they don't yet exist):

  • one on reports(id)
  • one on host_identifiers(source_id, value)
  • one on hosts(id)

Your query is hard to read because you don't qualify the columns with the table name.

0
Romitaro On

Wow! Adding index host_identifiers(source_id, value) is exactly what I was looking for:

create INDEX host_identifiers_source_id_value on host_identifiers(source_id, value);

Page load time for 'Tasks' tab decreased from 70s to just 13s.

Thanks!