There is a table (mileage_event) which shows mileage of all cars. Contains the following columns: created_at - date, car_id - car id, car_mileage - mileage of a car
It is worth considering that not all days have data. For example:
car_id | created_at | car_mileage |
---|---|---|
d016e61c5326-f246894a | 2023-05-22 | 100000 |
d016e61c5326-f246894a | 2023-05-26 | 105000 |
d016e61c5326-f246894a | 2023-05-27 | 110000 |
As you can see, there are no rows for this car between 2023-05-22 and 2023-05-26 (i.e. 3 days are absent). It means that the car hadn't been on drive for these days.
What do I want: to generate the missing days and add the nearest mileage value to them.
(Note: it's Exasol database)
As a result:
car_id | created_at | car_mileage |
---|---|---|
d016e61c5326-f246894a | 2023-05-22 | 100000 |
d016e61c5326-f246894a | 2023-05-23 | 100000 |
d016e61c5326-f246894a | 2023-05-24 | 100000 |
d016e61c5326-f246894a | 2023-05-25 | 100000 |
d016e61c5326-f246894a | 2023-05-26 | 105000 |
d016e61c5326-f246894a | 2023-05-27 | 110000 |
Thanks in advance!
UPD: I could be able to come up with this but it seems I'm wrong somewhere and do not see an error:
WITH date_series AS (select add_days(current_date - interval '2' year, level - 1) as dates
from dual
connect by level <= days_between(current_date,current_date - interval '2' year) + 1
order by local.dates) --to constantly get only last 2 years
,
combined AS (
SELECT
m.car_id as car_id,
d.dates,
COALESCE(m.car_mileage,
LAG(m.car_mileage, 1) OVER (PARTITION BY m.car_id ORDER BY d.dates)) AS car_mileage
FROM date_series d
LEFT JOIN mileage_event m ON d.dates = m.created_at
)
SELECT * FROM combined ORDER BY car_id, dates
And still I get this: | car_id | created_at | car_mileage | | --- | --- | --- | | d016e61c5326-f246894a | 2023-05-22 | 100000 | | null | 2023-05-23 | null | | null | 2023-05-24 | null | | null | 2023-05-25 | null | | d016e61c5326-f246894a | 2023-05-26 | 105000 | | d016e61c5326-f246894a | 2023-05-27 | 110000 |
I get that it's left join so it obviously cannot find the right values (and shows null instead).
Ah, always good to see someone asking a legitimate question and people commenting "You should google it".
You can achieve this by leveraging common table expressions (CTEs), window functions, and joining with a series of dates. The general strategy is to:
Generate a series of dates spaning the range you're interested in. For each car_id, use a LEFT JOIN to combine this date series with your existing mileage data. Employ window functions to fill in the missing mileage data. Here is how i would do it:
I hope that answer your question and works. Let me know if you need more help.
UPDATE
The issue is probably the LAG function in the combined CTE. When you are using the LAG function within the LEFT JOIN context, it will not have a car_id value to partition by for the rows that have a null car_mileage. This is why the LAG function doesn't return a value for these rows.