Postgresql - LEFT JOIN LATERAL is too slow than subquery

5.9k views Asked by At

I have some difficulty using 'LEFT JOIN LATERAL' function with postgresql 9.5.

In my table, there are three columns of 'ID', 'DATE', 'CODE'. One person (ID) have multiple rows as below. The number of ID is 362 and total row number is about 2500000.

ID   /  DATE     / CODE
1    /  20020101 / drugA
1    /  20020102 / drugA
1    /  20020103 / drugB
1    /  20020104 / drugA
1    /  20020105 / drugA
1    /  20020106 / drugB
1    /  20020107 / drugA
2    /  ...      / ...

I need to summarize information of drug A used between the first day and last day of drugB.

In the above case, only two rows should be remained for ID (1) [between 20020103 ~ 20020106; the period of drugB].

1    /  20020104 / drugA
1    /  20020105 / drugA

To take this job, I write SQL code using 'LEFT LATERAL JOIN' as below.

SELECT * FROM (SELECT ID, min(DATE) as start_date, max(DATE) as end_date from MAIN_TABLE WHERE CODE = 'drugA' GROUP BY ID) AA
LEFT JOIN LATERAL (SELECT ID, COUNT(ID) as no_tx, min(DATE) as fday_tx, max(DATE) lday_tx from MAIN_TABLE WHERE CODE = 'drugB' AND DATE > AA.start_date AND DATE < AA.end_date GROUP BY ID) as BB USING(ID);

There are only 362 person ID, but this postgresql code take about 2 mins.

It's too slow. Therefore, I tried another SQL code using subquery.

SELECT * FROM (SELECT ID, min(DATE) as start_date, max(DATE) as end_date from MAIN_TABLE WHERE CODE ='drugA' GROUP BY ID) AA
LEFT JOIN (
       SELECT ID, COUNT(ID) as no_tx, min(DATE) as fday_tx, max(DATE) lday_tx FROM (SELECT ID, DATE, CODE FROM MAIN_TABLE) BB
            LEFT JOIN (SELECT ID, min(DATE) as start_date, max(DATE) as end_date from MAIN_TABLE WHERE CODE ='drugA' GROUP BY ID) CC USING (ID)
            WHERE CODE = 'drugB' and DATE > start_date and DATE < end_date GROUP BY ID
            ) DD USING (ID);

This code is not simple but very fast (take only 1.6 sec).

When I compare the explain of two codes, the second code use hash join, but the first code do not.

Can I get some hint to improve the first code with 'LEFT LATERAL JOIN' function more efficiently?

1

There are 1 answers

1
Gordon Linoff On BEST ANSWER

Why not just use a join and group by?

SELECT AA.ID, COUNT(B.ID) as no_tx, min(B.DATE) as fday_tx, max(B.DATE) as lday_tx,
       AA.start_date, AA.end_date
FROM (SELECT ID, min(DATE) as start_date, max(DATE) as end_date 
      FROM MAIN_TABLE
      WHERE CODE = 'drugA'
      GROUP BY ID
     ) AA LEFT JOIN
     MAIN_TABLE b
     ON b.CODE = 'drugB' AND b.DATE > AA.start_date AND b.DATE < AA.end_date
GROUP BY AA.ID,  AA.start_date, AA.end_date;

Or, perhaps more efficiently, window functions:

SELECT ID, SUM(CASE WHEN code = 'drugB' THEN 1 ELSE 0 END) as no_tx,
       MIN(CASE WHEN code = 'drugB' THEN DATE END) as fday_tx,
       MIN(CASE WHEN code = 'drugB' THEN DATE END) as lday_tx,
       start_date, end_date
FROM (SELECT t.*,
             MIN(CASE WHEN code = 'drugA' THEN date END) as start_date,
             MAX(CASE WHEN code = 'drugB' THEN date END) as end_date
      FROM MAIN_TABLE t
     ) t
WHERE code in ('drugA', 'drugB') AND
      date between start_date and end_date
GROUP BY t.id;