How to create a DynamoDB Global Secondary Index with a hash of multiple fields?

1.8k views Asked by At

To me, the word "hash" conveys that it IS possible to a hash consisting of multiple fields within DynamoDB. However, every article I find shows the "hash" consisting of only a single value... which doesn't make any sense to me.

My table consists of the following fields:

  • uid (PK)
  • provider
  • identifier
  • from
  • to
  • date_received
  • date_processed

The goal is to have multiple indexes based on how my app will retrieve data (other than by the PK, of course). The combinations are:

  1. By the providers's message identifier:
    Desired hash: provider + identifier

  2. By the conversation message identifier:
    Desired hash: from + to

  3. By the date received and if is is processed
    Desired hash: _ac

  4. By the date received and if is is processed
    Desired hash: account

Here's an one of the examples of what I've tried and were not successful ...

  MessagesTable:
    Type: AWS::DynamoDB::Table
    Properties:
      TableName: messages
      BillingMode: PAY_PER_REQUEST
      AttributeDefinitions:
        - AttributeName: uid
          AttributeType: S
        - AttributeName: account
          AttributeType: S
        - AttributeName: provider
          AttributeType: S
        - AttributeName: identifier
          AttributeType: S
        - AttributeName: from
          AttributeType: N
        - AttributeName: to
          AttributeType: N
        - AttributeName: _ac
          AttributeType: N
        - AttributeName: _ap
          AttributeType: N
      KeySchema:
        - AttributeName: uid
          KeyType: HASH
      GlobalSecondaryIndexes:
        - IndexName: idxConversation
          KeySchema:
            - AttributeName: from:to
              KeyType: HASH
            - AttributeName: _ac
              KeyType: RANGE
          Projection:
            ProjectionType: KEYS_ONLY
        - IndexName: idxProviderMessage
            KeySchema:
              - AttributeName: provider:identifier
                KeyType: HASH
              - AttributeName: _ac
                KeyType: RANGE
            Projection:
              ProjectionType: KEYS_ONLY
1

There are 1 answers

4
Charles On BEST ANSWER

That's not the way DDB works...

with

from: "[email protected]"
to: "[email protected]"

You'd want to have another attribute in the record

gsiHash: "[email protected]#[email protected]"

That's the attribute that you'd specify as the GSI hash key.

Note that in order to access the data via this GSI, you'd need to know both from and to.

In your case, you may want to take a cue from the Overloading Global Secondary Indexes page of the DDB docs

Instead of writing a single record, you'd write multiple records to the table

s: id, keytype: hash  
s: data, keytype: sort  
s: gsi-sk  

records would look like

id:"<uid>",data:"PRIMARY", gsi-sk:"<?>" //"primary" record  
id:"<uid>",data:"FROM", gsi-sk:"[email protected]"
id:"<uid>",data:"TO", gsi-sk:"[email protected]"
id:"<uid>",data:"FROMTO", gsi-sk:"[email protected]#[email protected]"
id:"<uid>",data:"PROVIDER", gsi-sk:"whateverid"
<ect>

Now you create a GSI with data as the hash key, and gsi-sk as the sort key.

Expanding on my comment
Alternatively, you might expand what you put into "data"

id:"<uid>",data:"PRIMARY", gsi-sk:"<?>" //"primary" record  
id:"<uid>",data:"FROM#[email protected]", gsi-sk:"TO#[email protected]"
id:"<uid>",data:"TO#[email protected]", gsi-sk:"FROM#[email protected]"
id:"<uid>",data:"PROVIDER#<whateverid>", gsi-sk:"IDENTIFIER#<someid>"
<ect>

How much of the data you leave in primary record depends on your access requirements. Do you want to be able to get everything with a GetItem(hk=<uid>, sk="PRIMIARY") or is a Query(hk=<uid>) acceptable