Timescaledb Hierarchical continuous aggregates generate duplicated time bucket

Question

Timescaledb Hierarchical continuous aggregates generate duplicated time bucket

189 views Asked by chdegrave At 19 December 2023 at 15:08

I'm new in Timescaledb and I'm exploring the continuous aggregates. I use the docker image timescale/timescaledb-ha:pg14 (PostgreSQL 14 with Timescaledb v2.11.1)

I have historical data which are half-hourly values for several devices and I want to aggregate these values in hourly values and then in yearly values. I've created a hypertable to store half-hourly data. Then, I created a first continuous aggregate for the hourly values and on top a second continuous aggregate for the yearly values. Everything seems to be going well but at the end for the yearly aggregates I have a duplicate time bucket for each device for the last year.

Create hypertable

CREATE TABLE observation_hah_timescaledb (
    device_id integer NOT NULL,
    observation_timestamp TIMESTAMPTZ NOT NULL,
    value double precision
);

SELECT create_hypertable('observation_hah_timescaledb','observation_timestamp');

Insert data

I insert a data set from 2018 to 2022 in the hypertable. For a historical reason, the timestamp of these data points to the end of the time period. So I have to shift the data by half an hour. I replace the empty values (-9999) with NULL values.

INSERT INTO observation_hah_timescaledb (
                                                observation_timestamp,
                                                device_id,
                                                value
)
SELECT
    date_time - INTERVAL '30 min',
    device_id,
    NULLIF(value, -9999)
FROM observation_hah
WHERE
    date_time >= '2018-01-01 00:00:00';

Continuous aggregate 1 (hourly)

With the half-hourly values, I aggregate data by hour.

CREATE MATERIALIZED VIEW observation_hourly_cont_agg
WITH (timescaledb.continuous) AS
SELECT
    time_bucket('1hour', observation_timestamp) AS time_bucket,
    device_id,
    AVG(value) AS hourly_value,
    count(value)/2 AS data_acq
FROM
    observation_hah_timescaledb
GROUP BY
    time_bucket('1hour', observation_timestamp), device_id;

Select data

I select the data for one device and for the last year and I get 8760 rows (24*365). The aggregation values are correct.

SELECT
    *
FROM observation_hourly_cont_agg
WHERE
    device_id=6627
AND
    time_bucket>='2022-01-01';

Continuous aggregate 2 (yearly)

I create an continuous aggregate for the yearly values on top of the first continuous aggregate (hourly values).

CREATE MATERIALIZED VIEW observation_yearly_cont_agg
WITH (timescaledb.continuous) AS
SELECT
    time_bucket('1year', observation_hourly_cont_agg.time_bucket) AS time_bucket,
    device_id,
    AVG(hourly_value) AS avg_year_value,
    COUNT(observation_hourly_cont_agg.time_bucket),
    MIN(observation_hourly_cont_agg.time_bucket),
    MAX(observation_hourly_cont_agg.time_bucket)

FROM
    observation_hourly_cont_agg
WHERE
    data_acq = 1
GROUP BY
    time_bucket('1year', observation_hourly_cont_agg.time_bucket), device_id
;

Select data

I select the data for the device. These values are correct, but I get an extra row for the last year for each device. This last row includes only a part of the data from september to december.

SELECT
    *
FROM observation_yearly_cont_agg
WHERE
    device_id=6627;

time_bucket	device_id	avg_year_value	count	min	max
2018-01-01 00:00:00.000000 +00:00	6627	11.25464132507152	8392	2018-01-01 03:00:00.000000 +00:00	2018-12-31 23:00:00.000000 +00:00
2019-01-01 00:00:00.000000 +00:00	6627	9.57086852388762	7933	2019-01-01 00:00:00.000000 +00:00	2019-12-31 23:00:00.000000 +00:00
2020-01-01 00:00:00.000000 +00:00	6627	8.461547559760978	8032	2020-01-01 01:00:00.000000 +00:00	2020-12-31 23:00:00.000000 +00:00
2021-01-01 00:00:00.000000 +00:00	6627	9.730555202642625	7871	2021-01-01 00:00:00.000000 +00:00	2021-12-31 23:00:00.000000 +00:00
2022-01-01 00:00:00.000000 +00:00	6627	9.444980719794348	7780	2022-01-01 00:00:00.000000 +00:00	2022-12-31 23:00:00.000000 +00:00
2022-01-01 00:00:00.000000 +00:00	6627	9.19524472741742	2513	2022-09-04 00:00:00.000000 +00:00	2022-12-31 23:00:00.000000 +00:00

Am I doing something wrong?

I found a post on the timescaledb forum that describes similar behavior: Continuous Aggregate gets wrong data and duplicated time bucket after refreshing

When I refresh the continuous aggregates nothing happens.

CALL refresh_continuous_aggregate('observation_hourly_cont_agg', '2021-01-01', '2023-01-01');

CALL refresh_continuous_aggregate('observation_yearly_cont_agg', '2021-01-01', '2023-01-01');

Can anyone help me?

Original Q&A

There are 1 answers

**chdegrave** · Answer 1 · 2023-12-22T16:09:28+00:00

I finally found a solution.

First, I tried to create continuous aggregates with no data but the views were populated with data.

CREATE MATERIALIZED VIEW observation_hourly_cont_agg
WITH (timescaledb.continuous) AS
SELECT
    time_bucket('1hour', observation_timestamp) AS time_bucket,
    device_id,
    AVG(value) AS hourly_value,
    count(value)/2 AS data_acq
FROM
    observation_hah_timescaledb
GROUP BY
    time_bucket('1hour', observation_timestamp), device_id
WITH NO DATA;

So, I have looked at this question: TimescaleDB Continuous Aggregates WITH NO DATA not working

It is suggested to disable the real time aggregation.

I re-created the continuous aggregates:

CREATE MATERIALIZED VIEW observation_hourly_cont_agg
WITH (timescaledb.continuous, timescaledb.materialized_only=True) AS
SELECT
    time_bucket('1hour', observation_timestamp) AS time_bucket,
    device_id,
    AVG(value) AS hourly_value,
    count(value)/2 AS data_acq
FROM
    observation_hah_timescaledb
GROUP BY
    time_bucket('1hour', observation_timestamp), device_id
WITH NO DATA;

CREATE MATERIALIZED VIEW observation_yearly_cont_agg
WITH (timescaledb.continuous, timescaledb.materialized_only=True) AS
SELECT
    time_bucket('1year', observation_hourly_cont_agg.time_bucket) AS time_bucket,
    device_id,
    AVG(hourly_value) AS avg_year_value,
    COUNT(observation_hourly_cont_agg.time_bucket),
    MIN(observation_hourly_cont_agg.time_bucket),
    MAX(observation_hourly_cont_agg.time_bucket)

FROM
    observation_hourly_cont_agg
WHERE
    data_acq = 1
GROUP BY
    time_bucket('1year', observation_hourly_cont_agg.time_bucket), device_id
WITH NO DATA;

I populated them:

CALL refresh_continuous_aggregate('observation_hourly_cont_agg', '2017-01-01', '2023-01-01');
CALL refresh_continuous_aggregate('observation_yearly_cont_agg', '2017-01-01', '2023-01-01');

Now, I'm getting what I expected:

SELECT
    *
FROM observation_yearly_cont_agg
WHERE
    device_id=6627
ORDER BY
    time_bucket;

time_bucket	device_id	avg_year_value	count	min	max
2018-01-01 00:00:00.000000 +00:00	6627	11.254641325071463	8392	2018-01-01 03:00:00.000000 +00:00	2018-12-31 23:00:00.000000 +00:00
2019-01-01 00:00:00.000000 +00:00	6627	9.570868523887526	7933	2019-01-01 00:00:00.000000 +00:00	2019-12-31 23:00:00.000000 +00:00
2020-01-01 00:00:00.000000 +00:00	6627	8.461547559761014	8032	2020-01-01 01:00:00.000000 +00:00	2020-12-31 23:00:00.000000 +00:00
2021-01-01 00:00:00.000000 +00:00	6627	9.730555202642627	7871	2021-01-01 00:00:00.000000 +00:00	2021-12-31 23:00:00.000000 +00:00
2022-01-01 00:00:00.000000 +00:00	6627	9.444980719794358	7780	2022-01-01 00:00:00.000000 +00:00	2022-12-31 23:00:00.000000 +00:00

Pay attention to your version of the timescale (Timescale doc - Real time aggregates): "In Timescale versions 1.7 to 2.12, real time aggregates are enabled by default; when you create a continuous aggregate view, queries to that view include the most recent data, even if it has not yet been aggregated. In Timescale 2.13 and later real time aggregates are DISABLED by default."

TechQA.

Timescaledb Hierarchical continuous aggregates generate duplicated time bucket

Create hypertable

Insert data

Continuous aggregate 1 (hourly)

Select data

Continuous aggregate 2 (yearly)

Select data

There are 1 answers

Related Questions in TIME-SERIES

Related Questions in TIMESCALEDB

Related Questions in POSTGRESQL-14

Related Questions in CONTINUOUS-AGGREGATES

Popular Questions

Trending Questions