How do I interpolate at specific non-uniform times in AWS Timestream?

27 views Asked by At

I have a table that looks something like this:

level1 level2 state measure_name time value
1 1 opening memory 30:00 10
1 1 open memory 30:01 11
1 2 opening memory 31:02 20
1 2 open memory 31:03 21
1 1 close memory 31:04 0
1 2 open memory 31:04 10
1 1 close memory 32:04 0

Notice that the times are not uniform; there are differing periods between each entry in the table. What I'm trying to do is sum the last value for each unique level2 value of all entries having level1 = 1 at each of the point in time for which I have a level1 reading. Imagine an OS having processes run in parallel. One process starts and reports some memory usage, as the process runs it reports a higher memory usage. At this point in time, the total memory usage is the last reported usage of the only process running. A second process begins and does the same type of thing. As it initially starts it reports its memory usage so the total usage at that point in time is its memory plus the last reported usage of the first process. Process 2 now reports a new usage figure and at that point the total memory useage is the sum of the last value reported by each of the running processes. The first process now exits and reports a usage of 0. The total usage at that point can be viewed exactly the same as before, the sum of the last reported usage as the completed process reported 0 usage. (In and ideal world we'd like to exclude any level2 records where the last reported value == 0 entirely from the current calculation so we don't end up summing stuff that won't contribute to the current total, but let's not try to run before we can walk! :o)

Essentially, I'm trying to get a list of the green circles on this stacked graph of the usage:

Graph of usage over time

level1 time total_value
1 30:00 10
1 30:01 11
1 31:02 31
1 31:03 32
1 31:04 10
1 32:04 0

What I've been trying to do is to create an array of all the unique timestamps for which I have readings to pass into the interpolate_locf(timeseries, array[timestamp]) function so could construct a dataset containing a value for each level2 item at every sample time but that's not working.

WITH sample_times AS (
    SELECT
        array_sort(array_agg(time)) time_array
    FROM
        my_table
    WHERE
        level1 = 1
        AND time > '2024-01-29 15:30:59'
)
SELECT
    level1,
    level2,
    interpolate_locf(
        CREATE_TIME_SERIES(time, value), (select time_array from sample_times)) memory
FROM
    my_table
WHERE
    level1 = 1
    AND time > '2024-01-29 15:30:59'
GROUP BY
    level1,
    level2

Gives the following error message:

Cannot interpolate outside of timeseries defined time range. (Service: AmazonTimestreamQuery; Status Code: 400; Error Code: ValidationException; Request ID: S4UMO2H4ERGGKTYEPYF42MZ7GA; Proxy: null)

How do I do what I'm trying to do? I'm really trying to avoid bucketing as the risk there is I have multiple additional usage events within the bucket before multiple closes and they will be netted off incorrectly within the bucked duration to give too low total max usage.

0

There are 0 answers