Imagine, for example that I have a lot of users. Every time a user messages something, he/she can message it to a list of other users (similar to mass-emailing). However, I just want to store the message once to save storage space. So, when one of the message receiver opens his mailbox, he/she must query for that message there. What's more efficient in terms of setting up the messaging system's entities (tables)? Note that: in an non-RDBMS, join queries are not allowed. So is this a plausible set up, and how can I make it more efficient (1):
Table: Message (keeps one copy of all messages)
| Message_ID | Sender | Title | Body | List of Receivers |
// In this strategy, if I'm a receiver, I would check each message and search through the list of receivers to check and see whether I'm one of the receivers or not.
Or should I go with the following strategy (2):
Table: Message (keeps one copy of all messages)
| Message_ID | Sender | Title | Body |
Table: Message Receivers (store the same message ID for all receivers)
| Message_ID | Sender | Receiver |
// In this strategy, in runtime, make copies of the same message ID and same Sender and store one row for each receiver.
Which strategy seems more efficient? i.e. is iterating through an arraylist slower than simply iterating through a DBMS?
Any comments would be highly appreciated.
*Note: messages can be arbitrarily long, hence I don't want to store multiple copies of the same message.
Thank you.
Check out Brett Slatkin's 'Building Scalable, Complex Apps on App Engine' talk from Google IO 09.
He presents a pattern known as a 'RelationIndex', which is similar to your first suggestion but you move the list to its own entity. By setting the list entity's key-name to the message's key-name, you can scan for messages to a user using a keys_only query, then load only the message itself without the expense of deserializing the list of recipients.