I have a question related to PromQL in SysDig.
First some background: SysDig supports special variables $__interval
and $__interval_sec
that are automatically set to a certain fraction of the currently displayed time range. It is then used as the time interval between two neighboring data points in the display. For example, if you have a complete display time range of 1 min, then $__interval_sec
may be 10 sec (our sampling interval in the underlying data), but when the display time range is 6h, then $__interval_sec
may be raised by SysDig to 60 sec. A (partly confusing) description of this is here: https://docs.sysdig.com/en/docs/sysdig-monitor/using-monitor/using-promql/promql-variables/
I would like to show the CPU usage percentage reported by the collectd CPU plugin, as the sum of user time and system time.
I think that the following PromQL represents that properly:
sum(collectd_cpu_percent{cpu=~"system|user"}) by (....) # (1)
However, in examples I see the use of the $__interval
variable and an aggregation function over time for these kinds of problems:
sum(avg_over_time(collectd_cpu_percent{cpu=~"system|user"}[$__interval])) by (....) # (2)
In my experiments, these two queries show the exact same data points in the display, even when the $__interval_sec
increases above the interval of the underlying sampling data.
I can understand the logic of (2) in some way: Since the display will only show one data point for each $__interval_sec
interval, if the underlying sampling data is more granular, the query calculates the average of the sampling data in the $__interval_sec
interval for the data point to display.
One might think that (1) uses the sampling data of only the points in time that are shown in the display, but given that it shows the exact same data points as (2), it seems that there is also an aggregation of the underlying sampling data over the $__interval_sec
interval going on under the covers.
I verified that the data points shown in the display are the same, up to $__interval_sec
intervals of 1 day, with underlying sampling data interval of 10 sec, on data that varied sufficiently. So if (1) uses only the sampling data of the points in time that are shown in the display, that would have surfaced as a difference.
My questions are:
- What is the difference between queries (1) and (2)?
- Why are both queries showing the same data points in the SysDig dashboard panel?
- If (1) also aggregates over the sampling data under the covers, which aggregation function is used for that?