Considering other orchestration tools like dokku, dcos, deis, flynn, docker swarm, etc.. Kubernetes is no where near to them in terms of lines of code, on an average those tools are around 100k-200k lines of code.
Intuitively it feels strange that to manage containers i.e. to check health, scale containers up and down, kill them, restart them, etc.. doesn't have to consist of 2.4M+ lines of code (which is the scale of an entire Operating System code base), I feel like there is something more to it.
What is different in Kubernetes compared to other orchestration solutions that makes it so big?
I dont have any knowledge of maintaining more than 5-6 servers. Please explain why it is so big, what functionalities play big part in it.
First and foremost: don't be misled by the number of lines in the code, most of it are dependencies in the
vendor
folder that does not account for the core logic (utilities, client libraries, gRPC, etcd, etc.).Raw LoC Analysis with cloc
To put things into perspective, for Kubernetes:
For Docker (and not Swarm or Swarm mode as this includes more features like volumes, networking, and plugins that are not included in these repositories). We do not include projects like Machine, Compose, libnetwork, so in reality the whole docker platform might include much more LoC:
Roughly, it seems like the project accounts for half of the LoC (~1250K LoC) mentioned in the question (whether you value dependencies or not, which is subjective).
What is included in Kubernetes that makes it so big?
Most of the bloat comes from libraries supporting various Cloud providers to ease the bootstrapping on their platform or to support specific features (volumes, etc.) through plugins. It also has a Lot of Examples to dismiss from the line count. A fair LoC estimation needs to exclude a lot of unnecessary documentation and example directories.
It is also much more feature rich compared to Docker Swarm, Nomad or Dokku to cite a few. It supports advanced networking scenarios, has load balancing built-in, includes PetSets, Cluster Federation, volume plugins or other features that other projects do not support yet.
It supports multiple container engines, so it is not exclusively running docker containers but could possibly run other engines (such as rkt).
A lot of the core logic involves interaction with other components: Key-Value stores, client libraries, plugins, etc. which extends far beyond simple scenarios.
Distributed Systems are notoriously hard, and Kubernetes seems to support a majority of the tooling from key players in the container industry without compromise (where other solutions are making such compromise). As a result, the project can look artificially bloated and too big for its core mission (deploying containers at scale). In reality, these statistics are not that surprising.
Key idea
Comparing Kubernetes to Docker or Dokku is not really appropriate. The scope of the project is far bigger and it includes much more features as it is not limited to the Docker family of tooling.
While Docker has a lot of its features scattered across multiple libraries, Kubernetes tends to have everything under its core repository (which inflates the line count substantially but also explains the popularity of the project).
Considering this, the LoC statistic is not that surprising.