I want to compare values with values from following day (and also want to see which color is new or didn't appear).
I've did full outer self join and replaced null values in the section on the right of 'is_matched'. Is_matched is showing us if join worked or section on the right would be null without coalesce.
The only thing is very last column 'this_is_not_working'. It should have total_colours value for 'date_local2', not for 'date_local' and I couldn't figure out how to replace all nulls with values in 'this_is_not_working' column. I've tried with window functions and intervals but didn't really work.
I have used Postgres to create this db fiddle but I'm using Presto.
select * from colours
date_local | colour | amount :--------- | :----- | -----: 2020-01-01 | white | 10 2020-01-01 | white | 10 2020-01-01 | green | 20 2020-01-01 | white | 10 2020-01-01 | red | 25 2020-01-01 | white | 10 2020-01-02 | pink | 15 2020-01-02 | pink | 15 2020-01-02 | pink | 15 2020-01-02 | pink | 15 2020-01-02 | white | 10 2020-01-02 | white | 10 2020-01-02 | white | 10 2020-01-02 | white | 10 2020-01-02 | white | 10 2020-01-03 | pink | 15 2020-01-03 | pink | 15 2020-01-03 | pink | 15 2020-01-03 | green | 20 2020-01-03 | green | 20 2020-01-03 | green | 20
with a as(
select
*
,sum(colours) over(partition by date_local) as total_colour
from (
select
date_local
,colour
,count(colour) as colours
,sum(amount) as amount
from colours
group by 1,2
) as fr_om
)
select
a.*
,b.date_local as is_matched
,coalesce(b.date_local, a.date_local + interval '1' day) as date_local_2
,coalesce(b.colour, a.colour) as colour_2
,coalesce(b.colours, 0) as colour_2
,coalesce(b.amount, 0) as amount_2
,coalesce(b.colours - a.colours, a.colours) as colour_difference
,coalesce(b.amount - a.amount, a.amount) as amount_difference
,b.total_colour as this_is_not_working
from a
full outer join a as b
on a.date_local = b.date_local - interval '1' day
and a.colour = b.colour
order by 1
date_local | colour | colours | amount | total_colour | is_matched | date_local_2 | colour_2 | colour_2 | amount_2 | colour_difference | amount_difference | this_is_not_working :--------- | :----- | ------: | -----: | -----------: | :--------- | :------------------ | :------- | -------: | -------: | ----------------: | ----------------: | ------------------: 2020-01-01 | red | 1 | 25 | 6 | null | 2020-01-02 00:00:00 | red | 0 | 0 | 1 | 25 | null 2020-01-01 | green | 1 | 20 | 6 | null | 2020-01-02 00:00:00 | green | 0 | 0 | 1 | 20 | null 2020-01-01 | white | 4 | 40 | 6 | 2020-01-02 | 2020-01-02 00:00:00 | white | 5 | 50 | 1 | 10 | 9 2020-01-02 | pink | 4 | 60 | 9 | 2020-01-03 | 2020-01-03 00:00:00 | pink | 3 | 45 | -1 | -15 | 6 2020-01-02 | white | 5 | 50 | 9 | null | 2020-01-03 00:00:00 | white | 0 | 0 | 5 | 50 | null 2020-01-03 | pink | 3 | 45 | 6 | null | 2020-01-04 00:00:00 | pink | 0 | 0 | 3 | 45 | null 2020-01-03 | green | 3 | 60 | 6 | null | 2020-01-04 00:00:00 | green | 0 | 0 | 3 | 60 | null null | null | null | null | null | 2020-01-02 | 2020-01-02 00:00:00 | pink | 4 | 60 | null | null | 9 null | null | null | null | null | 2020-01-01 | 2020-01-01 00:00:00 | white | 4 | 40 | null | null | 6 null | null | null | null | null | 2020-01-03 | 2020-01-03 00:00:00 | green | 3 | 60 | null | null | 6 null | null | null | null | null | 2020-01-01 | 2020-01-01 00:00:00 | green | 1 | 20 | null | null | 6 null | null | null | null | null | 2020-01-01 | 2020-01-01 00:00:00 | red | 1 | 25 | null | null | 6
I don't think that you need a
full join
for this. Window functions can get the job done:The inner query aggregates by day and color, and computes group-level metrics as well as the overall count of records per day; it also sets a flag that indicates if an "adjacent" record exist on the next day for the same color.
Then, the outer query uses window function
lead()
to recover the values from the adjacent row.In your db fiddle, this yields: