Calculating average with biginteger time intervals using TimescaleDB

510 views Asked by At

I have a schema with the following fields:

       Name of row          |   Type    
  --------------------------+--------
  name                      | string
  value1                    | numeric
  timestamp                 | bigint

The rows contain entries with a name, a numeric value and a bigint value storing the unix timestamp in nanoseconds. Using TimescaleDB and I would like to use time_buckets_gapfill to retrieve the data. Given the timestamps are stored in bigint, this is quite cumbersome.

I would like to get aggregated data for these intervals: 5min, hour, day, week, month, quarter, year. I have managed to make it work using normal time_buckets, but now I would like to fill the gaps as well. I am using the following query now:

SELECT COALESCE(COUNT(*), 0), COALESCE(SUM(value1), 0), time_bucket_gapfill('5 min', date_trunc('quarter', to_timestamp(timestamp/1000000000)), to_timestamp(1599100000), to_timestamp(1599300000)) AS bucket
FROM playground
WHERE name = 'test' AND timestamp >= 1599100000000000000 AND timestamp <= 1599300000000000000
GROUP BY bucket
ORDER BY bucket ASC

This returns the values correctly, but does not fill the empty spaces. If I modified my query to

time_bucket_gapfill('5 min', 
                    date_trunc('quarter', 
                               to_timestamp(timestamp/1000000000), 
                               to_timestamp(1599100000), 
                               to_timestamp(1599200000))

I would get the first entry correctly and then empty rows every 5 minutes. How could I make it work? Thanks!

Here is a DB fiddle, but it doesn't work as it doesn't support TimeScaleDB. The query above returns the following:

  coalesce |       coalesce       |         avg_val
------------------------+-------------------------
     3     |          300         |  2020-07-01 00:00:00+00
     0     |           0          |  2020-09-03 02:25:00+00   
     0     |           0          |  2020-09-03 02:30:00+00
2

There are 2 answers

1
Sven Klemm On BEST ANSWER

You should use datatypes in your time_bucket_gapfill that matches the datatypes in your table. The following query should get you what you are looking for:

SELECT
  COALESCE(count(*), 0),
  COALESCE(SUM(value1), 0),
  time_bucket_gapfill(300E9::BIGINT, timestamp) AS bucket
FROM
  t
WHERE
  name = 'example'
  AND timestamp >= 1599100000000000000
  AND timestamp < 1599200000000000000
GROUP BY
  bucket;
0
Jani On

I have managed to solve it by building on Sven's answer. It first uses his function to fill out the gaps and then date_trunc is called eliminating the extra rows.

WITH gapfill AS (
    SELECT 
        COALESCE(count(*), 0) as count, 
        COALESCE(SUM(value1), 0) as sum,
        time_bucket_gapfill(300E9::BIGINT, timestamp) as bucket
    FROM
        playground
    WHERE
        name = 'test'
        AND timestamp >= 1599100000000000000
        AND timestamp < 1599300000000000000
    GROUP BY
        bucket
)
SELECT
    SUM(count),
    SUM(sum),
    date_trunc('quarter', to_timestamp(bucket/1000000000)) as truncated
FROM 
    gapfill
GROUP BY truncated
ORDER BY truncated ASC