Can relational database scale horizontally

53k views Asked by At

After some googling I have found:

Note from mysql docs:

MySQL Cluster automatically shards (partitions) tables across nodes, enabling databases to scale horizontally on low cost, commodity hardware to serve read and write-intensive workloads, accessed both from SQL and directly via NoSQL APIs.

Can relational database be horizontal scaling? Will it be somehow based on NoSQL database?

Do someone have any real world example?

How can I manage sql requests, transactions, and so on in such database?

6

There are 6 answers

5
Keshav On BEST ANSWER

It is possible but takes lots of maintenance efforts, Explanation -

Vertical Scaling of data (synonymous to Normalisation in SQL databases) is referred as splitting data column wise into multiple tables in order to reduce space redundancy. Example of user table -

enter image description here

Horizontal Scaling of data (synonymous to sharding) is referred as splitting row wise into multiple tables in order to reduce time taken to fetch data. Example of user table -

enter image description here

Key point to note here is as we can see tables in SQL databases are Normalised into multiple tables of related data. In order to shard data of such table on multiple machines, you would need to shard related normalised data accordingly which in turn would increase maintenance efforts. Like in the example presented above of SQL database,

Customer table which is related as one to many relation with Order table

If you move some rows of customer data onto other machine (referred as sharding) you would also need to move its related order data onto the same machine which would be troublesome task in case of multiple related tables.

Its convenient for NOSQL databases to shard out as they follow flat table structure (data is stored in aggregated form rather than normalised form).

2
theMayer On

I think the answer is, unequivocally, yes. You have to keep in mind that SQL is simply a data access language. There is absolutely no reason why it can't be extended across multiple computers and network partitions. Is it a challenging problem? Most certainly, and that's why software that does it is in its infancy.

Now, I think what you are trying to ask is "Can all features that I am familiar with and that arrive in a standard SQL-type relational database management system be developed to work with multiple servers in this manner?" While I admit I haven't studied the problem in depth, there are theorems out there that say "No, it cannot." Consistency-Availability-Partition Theorem posits that we cannot have all three qualities at the same level.

Now, for all practical purposes, "sharding" or "partitioning" or whatever you want to call it is not going away; to the contrary. This means that, given the degree to which CAP theorem holds, we are going to have to shift the way we think about databases, and how we interact with them (at least, to an extent). Many developers have already made the shift necessary to be successful on a No-SQL platform, but many more have not. Ultimately, sufficient maturity of the model and effective enough workarounds will be developed that traditional SQL databases, in the sense you refer, will be more or less practical across multiple machines. This is already starting to pan out, and I would say give it a few more years and we'll be to that point. Or we'll have collectively shifted thinking to the point where it is no longer necessary, and the world will be a better place. :)

0
illuminato On

Yes it can. It is called NewSQL.

NewSQL is a new approach to relational databases that wants to combine transactional ACID (atomicity, consistency, isolation, durability) guarantees of good ol’ RDBMSs and the horizontal scalability of NoSQL. Source

Examples for Databases:

  • User-Shared MySQL Cluster
  • Citus (PostgreSQL extension)
  • CockroachDB
  • Azure Cosmos DB
  • Google Spanner
  • NuoDB
  • Vitess
  • Splice Machine (part of Hadoop ecosystem)
  • MemQSL (in memory store)
  • VoltDB (in memory store)

Examples for Data Warehouses:

  • IBM Netezza
  • Oracle
  • Teradata
  • Hive Engine (part of Hadoop ecosystem)
  • Spark SQL (part of Hadoop ecosystem)
4
code4kix On

Thanks for the question and answer. I was trying to explain this to someone like this:

In terms of the CAP theorem, you can't have all three. So when a partition (network or server failure) occurs:

  • A relational database gives you C (consistency). So when a P (partition - server/network failure) occurs, you can't have A (availability - db goes down)

  • Some nosql datastores favor A when a P occurs, you can't have C (one or more of your replicated partitions will be out of sync, until the n/w comes back and they all sync up). So it will only be eventually consistent

  • As noted below in the comments by Manish, there are other nosql datastores that favor C when P occurs at the expense of A.

PS: edited #2 & added #3 to complete all scenarios. The intention behind my original answer was to provide an overly simplistic perspective on the trade offs between C, A and P. That is why I omitted #3.

0
Liang Zhang On

Yes, but it need to migrate when storage increased.

Some open source tools can support the feature, for example: Vitess or Apache ShardingSphere.

0
anegru On

Google Spanner is an example of a relational database that can scale horizontally. Sharding and replication are done automatically so no need to worry about that. For more information please check out this paper.