I am creating a collection which stores JSON object using MongoDB. I am stuck in Sharding part. I have an Case ID,Customer ID and Location for each of the record in the collection
The Case ID is a 10 digit number (only number and no alphabets).
The CustomerID is a combination of customer name and case ID.
The location is a 2dsphere value and I am expecting a location of different distinct values.
In addition to this I have customer name and case description to the record. All my search queries have search criteria of either Case ID, CustomerID or location.
Given this scenario, Can I create a compound key based on all these three values (CaseID, CustomerID and location). I believe this gives a high cardinality and easy to retrieve the records.
Could any one please suggest me if this is a good approach as I am not finding a compound shard key comprising of three values.
Thanks for your time and let me know if you need any information
The first thing to consider is whether it's necessary to shard. If your data set fits on a single server, then start out with an unsharded deployment. It's easy and seamless to convert this to a sharded cluster later on if needed.
Assuming you do indeed need to shard, your choice of shard key should be based on the following criteria:
You mention that all your queries contain either Case ID, Customer ID or location, but haven't described your use cases. By way of an example let's suppose your most frequent queries are to:
In such case, a good shard key candidate would be a compound shard key on (name, caseID) in that order (and a corresponding compound index). Consider whether this satisfies the above criteria:
Note that you cannot use a geospatial index as part of a shard key index (as documented here). However, you can still create and use a geospatial index on a sharded collection if using some other fields for the shard key. So for example, with the above shard key:
Additional documentation on shard key considerations can be found here.