Mesos cpu soft-limit dangers?

348 views Asked by At

I recently enabled cgroups/cpu isolation on my Mesos cluster. I've been running some stress tests (like starting some cpu-bound programs and seeing if a cpu-burst program can jump in and claim its cpu allocation), and it looks like Mesos is slicing the cpu correctly. However, I've seen some posts claiming it's dangerous for cpu-bound programs to take all idle cpu.

I'm trying to understand exactly what the dangers of soft-limiting cpu are. Is the problem that a critical task may not be able use its full cpu allocation immediately? What are some situations that soft-limits on cpu would cause problems? The alternative to my current setup is CFS scheduling, but my programs tend to be idle most of the time.

I use Marathon and Chronos (latest stable versions) to schedule tasks on my Mesos cluster (also the latest stable version).

1

There are 1 answers

0
Kenny Ingle On

The main danger of soft-limiting CPU is the inherent uncertainty. "Explicit is better than implicit." You hope your task gets scheduled on a host machine with tasks that are mostly idle, but it might not be so lucky. In unlucky cases where you have other tasks bursting, it means your task's performance is negatively affected, relative to scenarios where your task would be in an environment with hard limits. You may value predictability more than you do burst-ability. In a more ideal world, we might even want a mix.

That being said, hard limits are not necessarily a silver bullet. I can't speak to the reasoning of the posts you mention, but even the Mesos docs mention that CFS may not be appropriate for everything: https://mesosphere.github.io/marathon/docs/cfs.html