Optimizing GreenplumDB Query

137 views Asked by At

I am running a query in a Greenplum Database with 4 segment servers, 64GB in each server and with a great processing capacity. The main table ¨pic_isup_new¨ have 500 MM rows and the response time is very slow (300 seconds for 100000 rows) when I execute the query. Is there a way to optimize the query or should I look for Greenplum configuration?.

SELECT pic_isup_new.begin_time::date, date_part('hour', pic_isup_new.begin_time) AS begin_hour, pic_isup_new.cause_value, pic_isup_new.conversation, pic_isup_new.a_number, pic_isup_new.b_number, causas_liberacion.id_grupo, "substring"(pic_isup_new.b_number, '......$'::text) AS serie, 
    CASE
        WHEN length(pic_isup_new.b_number) = 7 THEN "substring"(pic_isup_new.b_number, 1, 1)
        WHEN length(pic_isup_new.b_number) = 8 THEN "substring"(pic_isup_new.b_number, 2, 1)
        WHEN length(pic_isup_new.b_number) = 10 AND "substring"(pic_isup_new.b_number, 1, 1) = '0'::text THEN "substring"(pic_isup_new.b_number, 4, 1)
        WHEN length(pic_isup_new.b_number) = 10 AND ("substring"(pic_isup_new.b_number, 1, 3) = '906'::text OR "substring"(pic_isup_new.b_number, 1, 3) = '903'::text) AND length(pic_isup_new.a_number) = 8 THEN "substring"(pic_isup_new.b_number, 4, 1)
        ELSE ''::text
    END AS prefijo, 
    CASE
        WHEN length(pic_isup_new.b_number) = 7 THEN centrales.codigo_area
        WHEN length(pic_isup_new.b_number) = 8 THEN "substring"(pic_isup_new.b_number, 1, 1)
        WHEN length(pic_isup_new.b_number) = 10 AND "substring"(pic_isup_new.b_number, 1, 1) = '0'::text THEN "substring"(pic_isup_new.b_number, 3, 1)
        WHEN length(pic_isup_new.b_number) = 10 AND ("substring"(pic_isup_new.b_number, 1, 3) = '906'::text OR "substring"(pic_isup_new.b_number, 1, 3) = '903'::text) AND length(pic_isup_new.a_number) = 8 THEN centrales.codigo_area
        ELSE ''::text
    END AS codigo_area, series.serie AS serie_id, departamento.departamento FROM ndm_comovi.pic_isup_new JOIN ndm_comovi.causas_liberacion ON causas_liberacion.id::text = pic_isup_new.cause_value JOIN ndm_comovi.centrales centrales ON pic_isup_new.dpc_decimal = centrales.pointcode_decimal JOIN ndm_comovi.series ON ((CASE
        WHEN length(pic_isup_new.b_number) = 7 THEN centrales.codigo_area
        WHEN length(pic_isup_new.b_number) = 8 THEN "substring"(pic_isup_new.b_number, 1, 1)
        WHEN length(pic_isup_new.b_number) = 10 AND "substring"(pic_isup_new.b_number, 1, 1) = '0'::text THEN "substring"(pic_isup_new.b_number, 3, 1)
        WHEN length(pic_isup_new.b_number) = 10 AND ("substring"(pic_isup_new.b_number, 1, 3) = '906'::text OR "substring"(pic_isup_new.b_number, 1, 3) = '903'::text) AND length(pic_isup_new.a_number) = 8 THEN centrales.codigo_area
        ELSE ''::text
    END) = series.codigo_area AND (CASE
        WHEN length(pic_isup_new.b_number) = 7 THEN "substring"(pic_isup_new.b_number, 1, 1)
        WHEN length(pic_isup_new.b_number) = 8 THEN "substring"(pic_isup_new.b_number, 2, 1)
        WHEN length(pic_isup_new.b_number) = 10 AND "substring"(pic_isup_new.b_number, 1, 1) = '0'::text THEN "substring"(pic_isup_new.b_number, 4, 1)
        WHEN length(pic_isup_new.b_number) = 10 AND ("substring"(pic_isup_new.b_number, 1, 3) = '906'::text OR "substring"(pic_isup_new.b_number, 1, 3) = '903'::text) AND length(pic_isup_new.a_number) = 8 THEN "substring"(pic_isup_new.b_number, 4, 1)
        ELSE ''::text
    END) = series.prefijo AND "substring"(pic_isup_new.b_number, '......$'::text) >= series.serie_inicial AND "substring"(pic_isup_new.b_number, '......$'::text) <= series.serie_final)
JOIN ndm_comovi.departamento ON series.serie = departamento.serie

Here I am adding the explain plan

QUERY PLAN  Gather Motion 8:1  (slice5; segments: 8)  (cost=825.59..11841.02 rows=5 width=157)  ->  Hash Join  (cost=825.59..11841.02 rows=1 width=157) 
    Hash Cond: ndm_comovi.pic_isup_new.dpc_decimal = centrales.pointcode_decimal    
    Join Filter: CASE WHEN length(ndm_comovi.pic_isup_new.b_number) = 7 THEN centrales.codigo_area WHEN length(ndm_comovi.pic_isup_new.b_number) = 8 THEN "substring"(ndm_comovi.pic_isup_new.b_number, 1, 1) WHEN length(ndm_comovi.pic_isup_new.b_number) = 10 AND "substring"(ndm_comovi.pic_isup_new.b_number, 1, 1) = '0'::text THEN "substring"(ndm_comovi.pic_isup_new.b_number, 3, 1) WHEN length(ndm_comovi.pic_isup_new.b_number) = 10 AND ("substring"(ndm_comovi.pic_isup_new.b_number, 1, 3) = '906'::text OR "substring"(ndm_comovi.pic_isup_new.b_number, 1, 3) = '903'::text) AND length(ndm_comovi.pic_isup_new.a_number) = 8 THEN centrales.codigo_area ELSE ''::text END = series.codigo_area    
    ->  Redistribute Motion 8:8  (slice4; segments: 8)  (cost=601.34..11615.38 rows=2 width=161)    
          Hash Key: ndm_comovi.pic_isup_new.dpc_decimal 
          ->  Hash Join  (cost=601.34..11615.06 rows=2 width=161)   
                Hash Cond: CASE WHEN length(ndm_comovi.pic_isup_new.b_number) = 7 THEN "substring"(ndm_comovi.pic_isup_new.b_number, 1, 1) WHEN length(ndm_comovi.pic_isup_new.b_number) = 8 THEN "substring"(ndm_comovi.pic_isup_new.b_number, 2, 1) WHEN length(ndm_comovi.pic_isup_new.b_number) = 10 AND "substring"(ndm_comovi.pic_isup_new.b_number, 1, 1) = '0'::text THEN "substring"(ndm_comovi.pic_isup_new.b_number, 4, 1) WHEN length(ndm_comovi.pic_isup_new.b_number) = 10 AND ("substring"(ndm_comovi.pic_isup_new.b_number, 1, 3) = '906'::text OR "substring"(ndm_comovi.pic_isup_new.b_number, 1, 3) = '903'::text) AND length(ndm_comovi.pic_isup_new.a_number) = 8 THEN "substring"(ndm_comovi.pic_isup_new.b_number, 4, 1) ELSE ''::text END = series.prefijo  
                Join Filter: "substring"(ndm_comovi.pic_isup_new.b_number, '......$'::text) >= series.serie_inicial AND "substring"(ndm_comovi.pic_isup_new.b_number, '......$'::text) <= series.serie_final    
                ->  Hash Join  (cost=14.20..10945.25 rows=558 width=144)    
                      Hash Cond: ndm_comovi.pic_isup_new.cause_value = causas_liberacion.id::text   
                      ->  Append  (cost=0.00..10348.36 rows=17985 width=140)    
                            ->  Seq Scan on pic_isup_new_1_prt_1_2_prt_day1 pic_isup_new  (cost=0.00..0.00 rows=1 width=140)    
                            ->  Seq Scan on pic_isup_new_1_prt_1_2_prt_day2 pic_isup_new  (cost=0.00..0.00 rows=1 width=140)    
                            ->  Seq Scan on pic_isup_new_1_prt_1_2_prt_day3 pic_isup_new  (cost=0.00..0.00 rows=1 width=140)    
                            ->  Seq Scan on pic_isup_new_1_prt_1_2_prt_day4 pic_isup_new  (cost=0.00..0.00 rows=1 width=140)    
                            ->  Seq Scan on pic_isup_new_1_prt_1_2_prt_day5 pic_isup_new  (cost=0.00..0.00 rows=1 width=140)    
…
                      ->  Hash  (cost=11.10..11.10 rows=31 width=8) 
                            ->  Broadcast Motion 8:8  (slice1; segments: 8)  (cost=0.00..11.10 rows=31 width=8) 
                                  ->  Seq Scan on causas_liberacion  (cost=0.00..8.31 rows=4 width=8)   
                ->  Hash  (cost=583.94..583.94 rows=32 width=31)    
                      ->  Broadcast Motion 8:8  (slice3; segments: 8)  (cost=9.36..583.94 rows=32 width=31) 
                            ->  Hash Join  (cost=9.36..581.06 rows=4 width=31)  
                                  Hash Cond: series.serie = departamento.serie  
                                  ->  Seq Scan on series  (cost=0.00..488.44 rows=4143 width=22)    
                                  ->  Hash  (cost=8.96..8.96 rows=4 width=41)   
                                        ->  Redistribute Motion 8:8  (slice2; segments: 8)  (cost=0.00..8.96 rows=4 width=41)   
                                              Hash Key: departamento.serie  
                                              ->  Seq Scan on departamento  (cost=0.00..8.32 rows=4 width=41)   
    ->  Hash  (cost=121.89..121.89 rows=1024 width=6)   
          ->  Seq Scan on centrales  (cost=0.00..121.89 rows=1024 width=6)  

Settings: effective_cache_size=48GB; optimizer=on

2

There are 2 answers

1
Venkatesh Raghavan On

Can you please add the explain plan? Is this with the legacy planner or pivotal query optimizer? Thanks.

V

2
Brendan Stephens On

String functions can be a real drag on some queries -- But I'm more curious about your statement memory. An explain plan will show what planner you are using, and how much memory is being allocated and wanted.

If you are using the default 128mb statement_mem, you could be doing a lot of work on disk, which is the database performance killer.