App engine datastore denormalization: index properties in the main entity or the denormalized entity?

Question

App engine datastore denormalization: index properties in the main entity or the denormalized entity?

250 views Asked by ali At 19 June 2015 at 00:32

Consider the classic example of blog data modelling, where we have a Blog entity with many properties, and we want to list the latest blogs in a page.

It makes sense to denormalize the BlogPost entity into a BlogPostSummary entity which will be shown in the list view, avoiding fetching and deserializing many unwanted properties.

class BlogPost(db.Model):
  title = db.StringProperty()
  content = db.TextProperty()
  created = db.DateProperty()
  ...

class BlogPostSummary(db.Model):
  title = db.StringProperty()
  content_excerpt = db.TextProperty()

The question is: which entity should hold the indexed properties? There are 3 options:

1. Index properties in both

Pros:
- Easy query on both entities.
Cons:
- Maintaining denormalized indexes is expensive.

2. Index properties in main entity only

Pros:
- Indexing properties in the main entity is more safe, as the denormalized entity is treated as redundancy.
Cons:
- Querying the list view will need a double roundtrip to datastore: One to key-only query for BlogPost entities, followed by a batch get for BlogPostSummary.

3. Index in denormalized entity only

Pros:
- The list view can be easily built by a single query.
Cons:
- The main entity cannot be queried by those properties anymore.
- The indexes occupy more space when the denormalized entity is a child of the main entity.

Which option would work better? Are there other options?

Would the double round trip to datastore in option 2 be a problem?

Original Q&A

There are 2 answers

**Andrei Volgin** · Answer 1 · 2015-06-19T00:53:04+00:00

This is an abstract question that does not have a "correct" answer. The choice of a data model depends on specific requirements of a project, including:

usage patterns (how often you need access to different data)
update patterns (e.g. separating frequently updated properties from stable properties in order to reduce write costs)
average performance and extreme-case performance requirements (e.g an average blog may have 10 posts, a very popular blog may have 10,000 posts)
ability to use memcache to reduce datastore trips and improve performance
data complexity (i.e. how many different kids of entities depend on this particular entity kind)
transaction requirements
security and access roles considerations (e.g. not exposing private data by mistake)

By the way, there is another way to model data in the Datastore - using child entities. For example, blog posts may be child entities of a blog entity. This way you can retrieve all blog posts with a single query by providing a parent key - without storing post IDs or keys in the blog entity or blog ID/key in the post entities.

**Brent Washburne** · Answer 2 · 2015-06-19T17:21:20+00:00

If you want to avoid fetching and deserializing unwanted properties, then put all your entities into a single model and use Projection queries:

class BlogPost(db.Model):
  title = db.StringProperty(indexed=True)
  content = db.TextProperty(indexed=False)
  content_excerpt = db.TextProperty(indexed=False)
  created = db.DateProperty(indexed=True)
  ...

posts = BlogPost.query().fetch(projection=["title", "created"])

(Note that the projection fields must be indexed.)

TechQA.

App engine datastore denormalization: index properties in the main entity or the denormalized entity?

1. Index properties in both

2. Index properties in main entity only

3. Index in denormalized entity only

There are 2 answers

Related Questions in GOOGLE-APP-ENGINE

Related Questions in DATA-MODELING

Related Questions in GOOGLE-CLOUD-DATASTORE

Related Questions in APP-ENGINE-NDB

Popular Questions

Popular Tags

Trending Questions