Querying a Global Secondary Index of a DynamoDB table without using the partition key

4.6k views Asked by At

I have a DynamoDB table with partition key as userID and no sort key. The table also has a timestamp attribute in each item. I wanted to retrieve all the items having a timestamp in the specified range (regardless of userID i.e. ranging across all partitions). After reading the docs and searching Stack Overflow (here), I found that I need to create a GSI for my table. Hence, I created a GSI with the following keys:

  • Partition Key: userID
  • Sort Key: timestamp

I am querying the index with Java SDK using the following code:

String lastWeekDateString = getLastWeekDateString();
AmazonDynamoDB client = AmazonDynamoDBClientBuilder.standard().build();
DynamoDB dynamoDB = new DynamoDB(client);

Table table = dynamoDB.getTable("user table");
Index index = table.getIndex("userID-timestamp-index");

QuerySpec querySpec = new QuerySpec()
    .withKeyConditionExpression("timestamp > :v_timestampLowerBound")
    .withValueMap(new ValueMap()
            .withString(":v_timestampLowerBound", lastWeekDateString));

    ItemCollection<QueryOutcome> items = index.query(querySpec);
    Iterator<Item> iter = items.iterator();

while (iter.hasNext()) {
    Item item = iter.next();
    // extract item attributes here
}

I am getting the following error on executing this code:

Query condition missed key schema element: userID

From what I know, I should be able to query the GSI using only the sort key without giving any condition on the partition key. Please help me understand what is wrong with my implementation. Thanks.

Edit: After reading the thread here, it turns out that we cannot query a GSI with only a range on the sort key. So, what is the alternative, if any, to query the entire table by a range query on an attribute? One suggestion I found in that thread was to use year as the partition key. This will require multiple queries if the desired range spans multiple years. Also, this does not distribute the data uniformly across all partitions, since only the partition corresponding to the current year will be used for insertions for one full year. Please suggest any alternatives.

2

There are 2 answers

1
Narek Daduryan On BEST ANSWER

When using dynamodb Query operation, you must specify at least the Partition key. This is why you get the error that userId is required. (In the AWS Query docs)

The condition must perform an equality test on a single partition key value.

The only way to get items without the Partition Key is by doing a Scan operation (but this wont be sorted by your sort key!)

If you want to get all the items sorted, you would have to create a GSI with a partition key that will be the same for all items you need (e.g. create a new attribute on all items, such as "type": "item"). You can then query the GSI and specify #type=:item

QuerySpec querySpec = new QuerySpec()
    .withKeyConditionExpression(":type = #item AND timestamp > :v_timestampLowerBound")
    .withKeyMap(new KeyMap()
            .withString("#type", "type"))
    .withValueMap(new ValueMap()
            .withString(":v_timestampLowerBound", lastWeekDateString)
            .withString(":item", "item"));

0
harley On

Always good solution for any customised querying requirements with DDB is to have right primary key scheme design for GSI. In designing primary key of DDB, the main principal is that hash key should be designed for partitioning entire items, and sort key should be designed for sorting items within the partition.

Having said that, I recommend you to use year of timestamp as a hash key, and month-date as a sort key. At most, the number of query you need to make is just 2 at max in this case. you are right, you should avoid filtering or scanning as much as you can.

So for example, you can make the query like this If the year of start date and one of end date would be same, you need only one query:

.withKeyConditionExpression("#year = :year and #month-date > :start-month-date and #month-date < :end-month-date")

and else like this:

.withKeyConditionExpression("#year = :start-year and #month-date > :start-month-date")

and

.withKeyConditionExpression("#year = :end-year and #month-date < :end-month-date")

Finally, you should union the result set from both queries. This consumes only 2 read capacity unit at most.

For better comparison of sort key, you might need to use UNIX timestamp.

Thanks