Mongodb Schema for Posts and Shares

1.9k views Asked by At

I am new to mongodb NoSQL concept and stuck at point where I am unable to take a decision for modelling the schema that could best serve my purpose.

I need to design schema in such a way that I have my end result as Posts and Shares sorted by time. For this I considered two options:


Option 1: Different Collection for Posts and Share as:

Schema for Post collection:

var postSchema = mongoose.Schema({
   postText: String,
   postedBy: String, 
   privacy: Number,
   updatedOn: { type: Date, default: Date.now }        
}, { collection: 'posts' }); 

Schema for Share Collection

var shareSchema = mongoose.Schema({
   dis_Id: { type: mongoose.Schema.Types.ObjectId }, // Id of post that is shared
   shareBy: { type: mongoose.Schema.Types.ObjectId },
   shareText: String,
   share_privacy: Number,
   shareOn: { type: Date, default: Date.now }
}, { collection: 'shares' });

Option 2: Embed Share in Posts itself

New Schema for Post

var postSchema = mongoose.Schema({
  postText: String,
  postedBy: String,
  updatedOn: { type: Date, default: Date.now }, 
  privacy: Number,
  share: {
    shareBy: { type: mongoose.Schema.Types.ObjectId },
    shareText: String, 
    share_privacy: Number,
    shareOn: { type: Date } 
  }       
}, { collection: 'posts' });

Now which of this could be a better choice? Option 1 has problem in querying as there are no joins in mongodb and Option 2 will lead to replication of the same data and can grow up to more than billions for hundreds of thousands of users.

2

There are 2 answers

4
Ricky On

Well it is easy to work with embedded documents as you get all the required data together so option 2 is good in that case. But if you are concerned about the size of documents increasing more then 16MB then go with option 1. In that case do not use aggregate query to join two collections because that will be time consuming operation became it will first perform all the operations and then perform skip operation. Instead you should query on each collection individually and create a complete response yourself with some custom logic.

2
SIDDHARTH J MEHTA On

Ok. I suggest the following approach:

  1. Since you already have the username, you can retrieve the list of post corresponding to that id in a sorted order using sort.

  2. Iterating over each post, you can fetch the shares in sorted order using same sort used above.

The key here is understanding indexes that you'll be setting. I suggest you should have following indexes.

post_schema: compound index on {username, updatedOn}

share_schema: compound index on {dis_Id, shareOn}.

If compound indexes are not used, your application will not scale for large number of records.