Can you scale a StatefulSet horizontally running a relational database in Kubernetes?

640 views Asked by At

Why I'd want to have multiple replicas of my DB?

  1. Redundancy: I have > 1 replicas of my app code. Why? In case one node fails, another can fill its place when run behind a load balancer.
  2. Load: A load balancer can distribute traffic to multiple instances of the app.
  3. A/B testing. I can have one node serve one version of the app, and another serve a different one.
  4. Maintenance. I can bring down one instance for maintenance, and keep the other one up with 0 down-time.

So, I assume I'd want to do the same with the backing db if possible too.

I realize that many nosql dbs are better configured for multiple instances, but I am interested in relational dbs.

I've played with operators like this and this but have found problems with the docs, have not been able to get them up and running and found the community a bit lacking. Relying on this kind of thing in production makes me nervous. The Mysql operator has a note even, saying it's not for production use.

I see that native k8s statefulsets have scaling but these docs aren't specific to dbs at all. I assume the complication is that dbs need to write persistently to disk via a volume and that data has to be synced and routed somehow if you have more than one instance.

So, is this something that's non-trivial to do myself? Or, am I better off having a dev environment that uses a one-replica db image in the cluster in order to save on billing, and a prod environment that uses a fully managed db, something like this that takes care of the scaling/HA for me? Then I'd use kustomize to manage the yaml variances.

Edit:

I actually found a postgres operator that worked great. Followed the docs one time through and it all worked, and it's from postgres docs.

1

There are 1 answers

1
Mikołaj Głodziak On

I have created this community wiki answer to summarize the topic and to make pertinent information more visible.

As Turing85 well mentioned in the comment:

Do NOT share a pvc to multiple db instances. Even if you use the right backing volume (it must be an object-based storage in order to be read-write many), with enough scaling, performance will take a hit (after all, everything goes to one file system, this will stress the FS). The proper way would be to configure clustering. All major relational databases (mssql, mysql, postgres, oracle, ...) do support clustering. To be on the secure side, however, I would recommend to buy a scalable database "as a service" unless you know exactly what you are doing.

The good solution might be to use a single replica StatefulSet for development, to avoid billing and use a fully managed cloud based sql solution in prod. Unless you have the knowledge or a suffiiciently professional operator to deploy a clustered dbms.

Another solution may be to use a different operator as Aaron did:

I actually found a postgres operator that worked great. Followed the docs one time through and it all worked, and it's from postgres: https://www.kubegres.io/doc/getting-started.html

See also this similar question.