I have some difficulty using 'LEFT JOIN LATERAL' function with postgresql 9.5.
In my table, there are three columns of 'ID', 'DATE', 'CODE'. One person (ID) have multiple rows as below. The number of ID is 362 and total row number is about 2500000.
ID / DATE / CODE
1 / 20020101 / drugA
1 / 20020102 / drugA
1 / 20020103 / drugB
1 / 20020104 / drugA
1 / 20020105 / drugA
1 / 20020106 / drugB
1 / 20020107 / drugA
2 / ... / ...
I need to summarize information of drug A used between the first day and last day of drugB.
In the above case, only two rows should be remained for ID (1) [between 20020103 ~ 20020106; the period of drugB].
1 / 20020104 / drugA
1 / 20020105 / drugA
To take this job, I write SQL code using 'LEFT LATERAL JOIN' as below.
SELECT * FROM (SELECT ID, min(DATE) as start_date, max(DATE) as end_date from MAIN_TABLE WHERE CODE = 'drugA' GROUP BY ID) AA
LEFT JOIN LATERAL (SELECT ID, COUNT(ID) as no_tx, min(DATE) as fday_tx, max(DATE) lday_tx from MAIN_TABLE WHERE CODE = 'drugB' AND DATE > AA.start_date AND DATE < AA.end_date GROUP BY ID) as BB USING(ID);
There are only 362 person ID, but this postgresql code take about 2 mins.
It's too slow. Therefore, I tried another SQL code using subquery.
SELECT * FROM (SELECT ID, min(DATE) as start_date, max(DATE) as end_date from MAIN_TABLE WHERE CODE ='drugA' GROUP BY ID) AA
LEFT JOIN (
SELECT ID, COUNT(ID) as no_tx, min(DATE) as fday_tx, max(DATE) lday_tx FROM (SELECT ID, DATE, CODE FROM MAIN_TABLE) BB
LEFT JOIN (SELECT ID, min(DATE) as start_date, max(DATE) as end_date from MAIN_TABLE WHERE CODE ='drugA' GROUP BY ID) CC USING (ID)
WHERE CODE = 'drugB' and DATE > start_date and DATE < end_date GROUP BY ID
) DD USING (ID);
This code is not simple but very fast (take only 1.6 sec).
When I compare the explain of two codes, the second code use hash join, but the first code do not.
Can I get some hint to improve the first code with 'LEFT LATERAL JOIN' function more efficiently?
Why not just use a
join
andgroup by
?Or, perhaps more efficiently, window functions: