is sharding same as distributed database in mongoDB?

3.7k views Asked by At

I want to implement mongodb as a distributed database but i cannot find good tutorials for it. Whenever i searched for distributed database in mongodb, it gives me links of sharding, so i am confused if both of them are the same things?

2

There are 2 answers

0
vmr On

Just some perspective on distributed databases:

In early nineties a lot of applications were desktop based and had a local database which contained MB/GBs of data.

Now with the advent of web based applications there can be millions of users who use and store their data, this data can run into GB/TB/PB. Storing all this data on a single server is economically expensive so there is a cluster of servers(or commodity hardware) across which data is horizontally partitioned. Sharding is another term for horizontal partitioning of data. For example you have a Customer table which contains 100 rows, you want to shard it across 4 servers, you can pick 'key' based sharding in which customers will be distributed as follows: SHARD-1(1-25),SHARD-2(26-50),SHARD-3(51-75),SHARD-4(76-100)

Sharding can be done in 2 ways:

  1. Hash based

  2. Key based

0
yaoxing On

Generally speaking, if you got a read-heavy system, you may want to use replication. Which is 1 primary with at most 50 secondaries. The secondaries share the read stress while the primary takes care of writes. It is a auto-failover system so that when the primary is down, one of the secondaries would take the job there and becomes a new primary.

Sharding, however, is more flexible. All the Shards share write stress and read stress. That is to say, data are distributed into different Shards. And each shard can be consists of a Replication system and auto-failover works as described above.

I would choose replication first because it's simple and is basically enough for most scenarios. And once it's not enough, you can choose to convert from replication to sharding.

There is also another discussion of differences between replication and sharding for your reference.