I'm still learning my lessons about data modeling in bigtable/nosql and would appreciate some feedback. Would it be fair to say that I should avoid parent->child relationships in my data modeling if I frequently need to deal with the children in aggregate across parents?
As an example, let's say I'm building a blog that will be contributed to by a number of authors, and each other has posts, and each post has tags. So I could potentially set up something like this:
class Author(db.Model):
owner = db.UserProperty()
class Post(db.Model):
owner = db.ReferenceProperty(Author,
collection_name='posts')
tags = db.StringListProperty()
As I understand this will create an entity group based on the Author parent. Does this cause inefficiency if I mostly need to query for Posts by tags which I expect to cut across multiple Authors?
I understand doing a query on list properties can be inefficient. Let's say each post has about 3 tags on average, but could go all the way up to 7. And I expect my collection of possible tags to be in the low hundreds. Is there any benefit to altering that model to something like this?
class Author(db.Model):
owner = db.UserProperty()
class Post(db.Model):
owner = db.ReferenceProperty(Author,
collection_name='posts')
tags = db.ListProperty(db.Key)
class Tag(db.Model):
name = db.StringProperty()
Or would I be better off doing something like this?
class Author(db.Model):
owner = db.UserProperty()
class Post(db.Model):
owner = db.ReferenceProperty(Author,
collection_name='posts')
class Tag(db.Model):
name = db.StringProperty()
class PostTag(db.Model):
post = db.ReferenceProperty(Post,
collection_name='posts')
tag = db.ReferenceProperty(Tag,
collection_name='tags')
And last question... what if my most common use case will be querying for posts by multiple tags. E.g., "find all posts with tags in {'apples', 'oranges', 'cucumbers', 'bicycles'}" Is one of these approaches more appropriate for a query that looks for posts that have any of a collection of tags?
Thanks, I know that was a mouthful. :-)
Something like the first or second approach are well suited for App Engine. Consider the following setup:
If you use the string tag (case-normalized) as the Tag entity key_name, you can efficiently query for posts with a specific tag, or list the tags of a post, or fetch tag statistics:
The third approach requires additional queries or fetches for most basic operations, and it is more difficult if you want to query for multiple tags.