I am implementing autoscaling policies for my spring boot app deployed in PCF. I have read using memory is not a good idea for java app because java doesn't not release memory so often to OS. Secondly, cpu utilisation metrics is not recommended by PCF.
So I am implementing policies using latency metrics. Now my doubt is what exactly is http latency in PCF. Is it like the absolute time taken a request to come and respond. Or the time from when request wasacknowledged. Does it consider the queue time before it gets acknowledged? There is a lot of confusion. If any one clear it so I can implement autoscaling policies in right way.
PS: any other suggestion for autoscaling will do.
It is the full response time for a request as visible from Gorouter's point of view.
An example to explain better:
The latency value used by autoscaler is a metric that's exposed by Gorouter called a TIMER metric. The timer lists the time from the point where Gorouter received the request to the time the response was completely delivered to the client (i.e. steps 3-5 in the example).
If you want to see the actual value for each request, you can run
cf logs
and look at the[RTR]
entries, the gorouter_time field will tell you the latency. You can also use thecf tail
command to look directly at the TIMER metrics, but this requires an additional cf cli plugin to be installed will show you the same number.Latency is a good metric to use so long as your response time is not dependent on too many other services or if you have good circuit breakers implemented). Latency can be a problem when slowness in other services reflects into the latency and causes the autoscaler to incorrectly scale your application (when in fact, the upstream service should be scaled).
Other options: