calculate weighted average for each day and id based on time intervals in PostgreSQL

Question

calculate weighted average for each day and id based on time intervals in PostgreSQL

1k views Asked by Niklas At 03 January 2025 at 15:15

I have a table in a PostgreSQL database that looks like this:

stid | e5   | e10  | diesel | date
-----+------+------+--------+------------------------
e850 | 1300 | 1400 | 1500   | 2016-05-02 05:30:01+02
e850 | 1400 | 1500 | 1700   | 2016-05-02 08:30:01+02
e850 | 1300 | 1400 | 1500   | 2016-05-02 21:00:01+02
e850 | 1200 | 1300 | 1350   | 2016-05-03 10:30:01+02
e850 | 1300 | 1400 | 1500   | 2016-05-03 21:00:01+02
954d | 1200 | 1100 | 1300   | 2016-05-02 03:30:01+02
954d | 1300 | 1100 | 1300   | 2016-05-02 15:00:01+02
954d | 1400 | 1800 | 1400   | 2016-05-02 22:30:01+02
954d | 1700 | 1900 | 1400   | 2016-05-03 09:30:01+02
954d | 1500 | 1900 | 1200   | 2016-05-03 23:30:01+02

So I have unique id's (stid), prices (e5,e10,diesel) and a timestamp (date) which indicates when the price was introduced. Now I want to calculate the average price per day and stid, weighted by the duration the price was charged. And I only want to take the period between 8 am and 8 pm into account.

To calculate the weighted average price of e5 for stid e850 and the date 2016-05-02 between 8 am and 8 pm I would do the following:

(1300 * 1801 + 1400 * 41399) / 43200 = 1395.83102

1300 is the price that was set at 5:30:01 am and 1801 is the duration in 
seconds between 8 am and 8:30:01 am.
1400 is the price that was set at 8:30:01 am and 41399 is the duration in 
seconds between 8:30:01 am and 8 pm.

In the end I would like to have a table that looks like this:

stid | date       | average_e5 | average_e10 | average_diesel
-----+------------+------------+-------------+---------------
e850 | 2016-05-02 | 1395.83102 | 1495.83102  | 1691.66204
e850 | 2016-05-03 | 1220.83565 | 1320.83565  | 1381.25347
954d | 2016-05-02 | 1241.66435 | 1100        | 1300
954d | 2016-05-03 | 1662.49306 | 1887.49769  | 1400

Edit: Solution

The code below from Vao Tsun's answer does almost everything that I am looking for. However, when there is no price before 8 am or after 8 pm for a day and id, I don't get the weighted average that I am looking for. But by creating dummy entries for the cases when there are no prices before 8 am or after 8 pm, I was able to solve this problem.

I used the following code to create a new table called mytable2, that inlcudes the dummy entries.

DROP TABLE IF EXISTS mytable2;

CREATE TABLE mytable2 AS SELECT * FROM mytable;

WITH c AS (
SELECT
    *,
    LAG(date) OVER(PARTITION BY stid ORDER BY date) AS lag_date,
    LAG(e5) OVER(PARTITION BY stid ORDER BY date) AS lag_e5,
    LAG(e10) OVER(PARTITION BY stid ORDER BY date) AS lag_e10,
    LAG(diesel) OVER(PARTITION BY stid ORDER BY date) AS lag_diesel
FROM mytable
)

INSERT INTO mytable2
SELECT
    stid,
    lag_e5 AS e5,
    lag_e10 AS e10,
    lag_diesel AS diesel,
    date_trunc('day', date) + '0 hours'::interval AS date
FROM c WHERE lag_date < date_trunc('day', date) + '0 hours'::interval
AND date > date_trunc('day', date) + '8 hours'::interval;

WITH d AS (
SELECT
    *,
    LEAD(date) OVER(PARTITION BY stid ORDER BY date) AS lead_date
FROM mytable
)

INSERT INTO mytable2
SELECT
    stid,
    e5,
    e10,
    diesel,
    date_trunc('day', date) + '23 hours'::interval AS date
FROM d WHERE lead_date >= date_trunc('day', date) + '24 hours'::interval
AND date < date_trunc('day', date) + '20 hours'::interval;

Then I can just run the code from Vao Tsun's answer to get the desired weighted averages. I only changed mytable to mytable2, to use the table with the added dummy entries.

with a as (
select *
, case
  when date < date_trunc('day', date) + '8 hours'::interval then date_trunc('day', date) + '8 hours'::interval
  when date > date_trunc('day', date) + '20 hours'::interval then date_trunc('day', date) + '20 hours'::interval
  else date
end d
, date_trunc('day', date) dt
from mytable2
)
, b as (
select stid, e5, e10, diesel,date,d, dt
, extract(epoch from lead(d) over (partition by stid,dt order by stid,d) - d) diff
from a
)
select DISTINCT
 stid, dt,sum(e5*diff*1.0) over (partition by stid,dt)/sum(diff) over (partition by stid,dt) e5_weight_avg
from b
order by stid desc, dt;
stid |         dt          |  e5_weight_avg
-----+---------------------+-----------------
e850 | 2016-05-02 00:00:00 | 1395.83101851852
e850 | 2016-05-03 00:00:00 | 1220.83564814815
954d | 2016-05-02 00:00:00 | 1241.66435185185
954d | 2016-05-03 00:00:00 | 1662.49305555556

The code can also be found here rextester

Original Q&A

There are 1 answers

**Vao Tsun** · Accepted Answer · 2017-09-11T13:02:16+00:00

I did some not needed CTE, to make it more readable:

t=# with a as (
select *
, case
  when date < date_trunc('day', date) + '8 hours'::interval then date_trunc('day', date) + '8 hours'::interval
  when date > date_trunc('day', date) + '20 hours'::interval then date_trunc('day', date) + '20 hours'::interval
  else date
end d
, date_trunc('day', date) dt
from mytable
)
, b as (
select stid, e5, e10, diesel,date,d, dt
, extract(epoch from lead(d) over (partition by stid,dt order by stid,d) - d) diff
from a
)
select
 stid, e5,date,d, diff,sum(e5*diff*1.0) over (partition by stid,dt)/sum(diff) over (partition by stid,dt) e5_weight_avg
from b
order by stid desc, date;
 stid |   e5    |        date         |          d          | diff  |  e5_weight_avg
------+---------+---------------------+---------------------+-------+------------------
 e850 | 1300.00 | 2016-05-02 05:30:01 | 2016-05-02 08:00:00 |  1801 | 1395.83101851852
 e850 | 1400.00 | 2016-05-02 08:30:01 | 2016-05-02 08:30:01 | 41399 | 1395.83101851852
 e850 | 1300.00 | 2016-05-02 21:00:01 | 2016-05-02 20:00:00 |       | 1395.83101851852
 e850 | 1200.00 | 2016-05-03 10:30:01 | 2016-05-03 10:30:01 | 34199 |             1200
 e850 | 1300.00 | 2016-05-03 21:00:01 | 2016-05-03 20:00:00 |       |             1200
 954d | 1200.00 | 2016-05-02 03:30:01 | 2016-05-02 08:00:00 | 25201 | 1241.66435185185
 954d | 1300.00 | 2016-05-02 15:00:01 | 2016-05-02 15:00:01 | 17999 | 1241.66435185185
 954d | 1400.00 | 2016-05-02 22:30:01 | 2016-05-02 20:00:00 |       | 1241.66435185185
 954d | 1700.00 | 2016-05-03 09:30:01 | 2016-05-03 09:30:01 | 37799 |             1700
 954d | 1500.00 | 2016-05-03 23:30:01 | 2016-05-03 20:00:00 |       |             1700
(10 rows)

and thus, skipping middle steps:

t=# with a as (
select *
, case
  when date < date_trunc('day', date) + '8 hours'::interval then date_trunc('day', date) + '8 hours'::interval
  when date > date_trunc('day', date) + '20 hours'::interval then date_trunc('day', date) + '20 hours'::interval
  else date
end d
, date_trunc('day', date) dt
from mytable
)
, b as (
select stid, e5, e10, diesel,date,d, dt
, extract(epoch from lead(d) over (partition by stid,dt order by stid,d) - d) diff
from a
)
select DISTINCT
 stid, dt,sum(e5*diff*1.0) over (partition by stid,dt)/sum(diff) over (partition by stid,dt) e5_weight_avg
from b
order by stid desc, dt;
 stid |         dt          |  e5_weight_avg
------+---------------------+------------------
 e850 | 2016-05-02 00:00:00 | 1395.83101851852
 e850 | 2016-05-03 00:00:00 |             1200
 954d | 2016-05-02 00:00:00 | 1241.66435185185
 954d | 2016-05-03 00:00:00 |             1700
(4 rows)

TechQA.

calculate weighted average for each day and id based on time intervals in PostgreSQL

Edit: Solution

There are 1 answers

Related Questions in SQL

Related Questions in POSTGRESQL

Related Questions in WEIGHTED-AVERAGE

Popular Questions

Popular Tags

Trending Questions