Saving only unique datapoints in SQL

44 views Asked by At

For simplicity: We have a table with 2 columns, value and date.

Every second a new data is received and we want to save it with it's timestamp. Since the data can be similar, to lower usage, if data is the same as previous entry, we don't save it.

Question: Given that same value was received during 24 hours, only the first value & date pair is saved. If we want to query 'Average value in last 1 hour', is there a way to have the db (PostgreSQL) see that no values are saved in last hour and search for last existing value entry?

2

There are 2 answers

0
GMB On

It is not as easy as it may seem, and it is not just about retrieving the latest data point when there is none available within the last hour. You want to calculate an average, so you need to rebuild the time-series data of the period, second per second, filling the gaps with the latest available data point.

I think the simplest approach is generate_series() to build the rows, and then a lateral join to recover the data:

select avg(d.value) avg_last_hour
from generate_series(now() - interval '1 hour', now(), '1 second') t(ts)
cross join lateral (
    select d.*
    from data d
    where d.date <= t.ts
    order by d.date desc
    limit 1
) t
0
Gordon Linoff On

Hmmm . . . if you simply want the average of values in the most recent hour in the data, you can use:

select date_trunc('hour', date) as ddhh, avg(value)
from t
group by ddhh
order by ddhh desc
limit 1;

If you have a lot of data being collected, it might be faster to add an index on date and use:

select avg(value)
from t
where date >= date_trunc('hour', (select max(t2.date) from t t2));