Graphite Server High IO Wait Time

895 views Asked by At

Our Graphite server is facing CPU and disk bottlenecks. The main problem is that the %IO Wait time is always in the range of 40%. We run it on a 2-core CPU on a RHEL server with 7.5 GiB RAM. (Agreed, our CPU configuration is primitive, but we'd like to understand why the following happens before upgrading it.)

Our Graphite server is processing over 160,000 data points per minute. But the Disk IO stats show that 40% time is spent in IO wait. Our disk CISS can write 160 MiB per second but Graphite is only able to utilise 2MiB per second.

Has anyone experienced this issue? What were your findings? Do you have any suggestions for the above questions?

Thank you very much!

1

There are 1 answers

0
Jan On

I'm a little late to the party, probably. I read that a spinning disk can do some 75-100 io ops per second (makes sense: 7200rpm = 120 revolutions per second, typical 9ms latency...). So if that is the case, the default value of 500 for MAX_UPDATES_PER_SECOND doesn't make sense to me.

I tried running bonnie++ to test my disk performance, and it went up to 50 random seeks per second (not in single user mode).

I was looking at similar performance problems, and tuned down the MAX_UPDATES_PER_SECOND value to 10. In our case with around 2000 metrics, this means that every 200s every metric is written to disk, which works for us.