How to achieve sorting by any attribute of an item in DynamoDB

8.5k views Asked by At

I have a DynamoDB structure as following.

  1. I have patients with patient information stored in its documents.
  2. I have claims with claim information stored in its documents.
  3. I have payments with payment information stored in its documents.
  4. Every claim belongs to a patient. A patient can have one or more claims.
  5. Every payment belongs to a patient. A patient can have one or more payments.

I created only one DynamoDB table since all of aws dynamodb documentations indicates using only one table if possible is the best solution. So I end up with following : BASE TABLE

In this table ID is the partition key and EntryType is the sortkey. Every claim and payment holds its owner. My access patterns are as following :

  1. Listing all patients in the DB with pagination with patients sorted on creation dates.
  2. Listing all claims in the DB with pagination with claims sorted on creation dates.
  3. Listing all payments in the DB with pagination with payments sorted on creation dates.
  4. Listing claims of a particular patient.
  5. Listing payments of a particular patient.

I can achieve these with two global secondary indexes. I can list patients, claims and payments sorted by their creation date by using a GSI with EntryType as a partition key and CreationDate as a sort key. Also I can list a patient's claims and payments by using another GSI with EntryType partition key and OwnerID sort key.

My problem is this approach brings me only sorting with creation date. My patients and claims have much more attributes (around 25 each) and I need to sort them according to each of their attribute as well. But there is a limit on Amazon DynamoDB that every table can have at most 20 GSI. So I tried creating GSI's on the fly (dynamically upon the request) but that also ended very inefficiently since it copies the items to another partition to create a GSI (as far as I know). So what is the best solution to sort patients by their patient name, claims by their claim description and any other fields they have?

1

There are 1 answers

2
Seth Geoghegan On

Sorting in DynamoDB happens only on the sort key. In your data model, your sort key is EntryType, which doesn't support any of the access patterns you've outlined.

You could create a secondary index on the fields you want to sort by (e.g. creationDate). However, that pattern can be limiting if you want to support sorting by many attributes.

I'm afraid there is no simple solution to your problem. While this is super simple in SQL, DynamoDB sorting just doens't work that way. Instead, I'll suggest a few ideas that may help get you unstuck:

  • Client Side Sorting - Use DDB to efficiently query the data your application needs, and let the client worry about sorting the data. For example, if your client is a web application, you could use javascript to dynamically sort the fields on the fly, depending on which field the user wants to sort by.
  • Consider using KSUIDs for your IDs - I noticed most of your access patterns involves sorting by CreationDate. The KSUID, or K-Sortable Globally Unique Id's, is a globally unique ID that is sortable by generation time. It's a great option when your application needs to create unique IDs and sort by a creation timestamp. If you build a KSUID into your sort keys, your query results could automatically support sorting by creation date.
  • Reorganize Your Data - If you have the flexibility to redesign how you store your data, you could accommodate several of your access patterns with fewer secondary indexes (example below).

Finally, I notice that your table example is very "flat" and doesn't appear to be modeling the relationships in a way that supports any of your access patterns (without adding indexes). Perhaps it's just an example data set to highlight your question about sorting, but I wanted to address a different way to model your data in the event you are unfamiliar with these patterns.

For example, consider your access patterns that require you to fetch a patient's claims and payments, sorted by creation date. Here's one way that could be modeled:

Patient relationships

This design handles four access patterns:

  1. get patient claims, sorted by date created.
  2. get patient payments, sorted by date created.
  3. get patient info (names, etc...)
  4. get patient claims, payments and info (in a single query).

The queries would look like this (in pseudocode):

  1. query where PK = "PATIENT#UUID1" and SK < "PATIENT#UUID1"
  2. query where PK = "PATIENT#UUID1" and SK > "PATIENT#UUID1"
  3. query where PK = "PATIENT#UUID1" and SK = "PATIENT#UUID1"
  4. query where PK = "PATIENT#UUID1"

These queries take advantage of the sort keys being lexicographically sorted. When you ask DDB to fetch the PATIENT#UUID1 partition with a sort key less than "PATIENT#UUID1", it will return only the CLAIM items. This is because CLAIMS comes before PATIENT when sorted alphabetically. The same pattern is how I access the PAYMENT items for the given patient. I've used KSUIDs in this scenario, which gives you the added feature of having the CLAIMS and PAYMENT items sorted by creation date!

While this pattern may not solve all of your sorting problems, I hope it gives you some ideas of how you can model your data to support a variety of access patterns with sorting functionality as a side effect.