I am new to mongodb NoSQL concept and stuck at point where I am unable to take a decision for modelling the schema that could best serve my purpose.
I need to design schema in such a way that I have my end result as Posts and Shares sorted by time. For this I considered two options:
Option 1: Different Collection for Posts and Share as:
Schema for Post collection:
var postSchema = mongoose.Schema({
postText: String,
postedBy: String,
privacy: Number,
updatedOn: { type: Date, default: Date.now }
}, { collection: 'posts' });
Schema for Share Collection
var shareSchema = mongoose.Schema({
dis_Id: { type: mongoose.Schema.Types.ObjectId }, // Id of post that is shared
shareBy: { type: mongoose.Schema.Types.ObjectId },
shareText: String,
share_privacy: Number,
shareOn: { type: Date, default: Date.now }
}, { collection: 'shares' });
Option 2: Embed Share in Posts itself
New Schema for Post
var postSchema = mongoose.Schema({
postText: String,
postedBy: String,
updatedOn: { type: Date, default: Date.now },
privacy: Number,
share: {
shareBy: { type: mongoose.Schema.Types.ObjectId },
shareText: String,
share_privacy: Number,
shareOn: { type: Date }
}
}, { collection: 'posts' });
Now which of this could be a better choice? Option 1 has problem in querying as there are no joins in mongodb and Option 2 will lead to replication of the same data and can grow up to more than billions for hundreds of thousands of users.
Well it is easy to work with embedded documents as you get all the required data together so option 2 is good in that case. But if you are concerned about the size of documents increasing more then 16MB then go with option 1. In that case do not use aggregate query to join two collections because that will be time consuming operation became it will first perform all the operations and then perform skip operation. Instead you should query on each collection individually and create a complete response yourself with some custom logic.