I read that dynamo db scan operation is slow when the data is large . but i want to know that, Having a scenario to extract all the items. Is it still preferred to avoid scan ? considering indexes are not free and i need all the items from table, i am going for this approach.

  1. Please suggest if their is any problem by choosing scan operation ?
  2. why only scan has parallel scan option, is query parallel by default ?
  3. if i use query operation with pagination will it run sequential or parallel?
1

There are 1 answers

3
Charles On BEST ANSWER

If you need all items, then Scan() is perfectly fine.

Just realize that DDB

  • only returns 1MB of data at a time, so you'll need to call in a loop using ExclusiveStartKey := LastEvaluatedKey
  • Scan() can quickly consume your provisioned RCU, so watch for throttle errors and retry.

The recommendation against Scan() is trying to use Scan() + filter in place of Query() for a subset of records. Scan() always reads the full table.

Also note that from a performance standpoint, Scan() supports parallel scans.

TotalSegments
For a parallel Scan request, TotalSegments represents the total number of segments into which the Scan operation will be divided. The value of TotalSegments corresponds to the number of application workers that will perform the parallel scan. For example, if you want to use four application threads to scan a table or an index, specify a TotalSegments value of 4.

But again, if using provisioned reads...a parallel scan will eat up RCU quickly.