rrdtool graph ignoring --step?

Question

rrdtool graph ignoring --step?

282 views Asked by powo At 09 February 2021 at 15:07

I have RRD Files with multiple months of PDP Data (5min Interval).

For general purpose Graphs its fine, when rrdtool automatically decides which RRA to use for displaying the Graph.

But some of my Graphs contain 95-Percentile Data in the legend, which I need to be calculated from "exact" 5min-Interval Data, because calculation of Percentile from aggregated Data-Points can (by it's nature) lead to dramatically incorrect values.'

I can fetch Data from RRD File with a step of 300 and I'll get the right data to calculate percentile on my own
PROBLEM: When graph'ing with a step of 300, the displayed Percentile value varies depending on the width of the Graph, even if the Time-Range is the same, and 300s Data is available for the whole Time-Range
if width for 1-month graph is 800px, the shown Percentile (and also max-values e.g) is wrong
if width for 1-month graph is 8000px, the Values are correct (matching the self-calculated values from fetch'ed data)

graph:

...
--step 300
...
"VDEF:perca=a,95,PERCENT",
...

created with:

        '-s', '300',
       ...
        "RRA:AVERAGE:0.5:1:53568",      # 6 months pdp
        "RRA:AVERAGE:0.5:12:8904",      # 1 hour, 1 year.
        "RRA:AVERAGE:0.5:288:730",      # 1 day, 2 years.
        "RRA:AVERAGE:0.5:2016:520",     # 1 week, 10 years.
        "RRA:MAX:0.5:1:600",            # 5 min: 2 days
        "RRA:MAX:0.5:12:8904",          # 1 hour, 1 year.
        "RRA:MAX:0.5:288:730",          # 1 day, 2 years.
        "RRA:MAX:0.5:2016:520",         # 1 week, 10 years

Original Q&A

There are 1 answers

**Steve Shipway** · Accepted Answer · 2021-02-09T19:23:39+00:00

This is due to data consolidation being performed prior to the VDEF calculation.

Although your rrdtool graph arguments specify a step of 300s, this is less width than a pixel of the graph, and so the data series are further averaged before you get to the VDEF. All the CDEF and VDEF functions will always work with a time series of one cdp per pixel. From the RRDTool manual:

Note: a step smaller than one pixel will silently be ignored.

This means that, while you can decrease the resolution of the data, you cannot increase it. Sadly, to get an accurate 95th Percentile, you need higher-resolution data.

So, if you omit the --step 300 in a narrow graph, what will happen is:

You ask for a 1-month time window
RRDTool calculates 1 pixel is about 1 hour
DS retrieves an Average time series from the 1hour RRA, one cdp per pixel (IE hour)
VDEF then consolidates this to a 95th percentile
The 95th percentile calculation is inaccurate

With the --step 300 it is slightly different process, but the same result:

You ask for a 1-month time window, with step 300
RRDTool calculates 1 pixel is about 1 hour
RRDTool DS retrieves a month's worth of data from the 300s RRA
RRDTool further consolidates this data down to 1cdp per pixel (IE per hour) using Average
VDEF then consolidates this to a 95th percentile
The 95th percentile calculation is inaccurate

So, you can see the final outcome is the same - its just where the 300s -> 1h consolidation happens, either in the RRA or at graph time.

When using a wide graph, the time per pixel becomes smaller, and RRDTool then no longer needs to perform its additional consolidation of the data, resulting in a more accurate calculation:

You ask for a 1-month time window
RRDTool calculates 1 pixel is about 5 minutes
RRDTool DS retrieves a month's worth of data from the 300s RRA
No further consolidation is required
VDEF then consolidates this to a 95th percentile
The 95th percentile calculation is accurate!

When you retrieve the raw data using rrdtool fetch1 then this extra consolodation doesn't happen, so you get:

You ask for a 1-month time window with step 300
RRDTool DS retrieves a month's worth of data from the 300s RRA
These data are output
Your spreadsheet then calculates a 95th percentile
The 95th percentile calculation is correct (well, as close as you can be with a 5min interval)

Your next question will likely be, how do I stop this from happening? The unfortunate answer is that you cannot. RRDTool does not have a Percentile type CF, and so the correct calculations cannot be performed in the RRA (this would be the only real solution).

The Routers2 frontend for MRTG calculated 95th Percentiles for the graphs, and the way it does it is to perform a high-resolution fetch to get the raw data and calculates the value internally before passing this in a HRULE when making the graph. In other words, it doesn't use a VDEF at all, due to this problem you are experiencing.

TechQA.

rrdtool graph ignoring --step?

There are 1 answers

Related Questions in GRAPH

Related Questions in PERCENTILE

Related Questions in RRDTOOL

Related Questions in RRD

Popular Questions

Trending Questions