Timestream AWS database

61 views Asked by At

In the Timestream database pricing documentation, I came across this sentence:

The dashboard queries contain essential dimensions and measures and relevant predicates, so Amazon Timestream’s distributed query engine can aggressively prune irrelevant data and scan approximately 2% of the data accumulated over the past six hours.

Could you please provide a more in-depth explanation of what the 2% represents? Why is the reference to the past six hours of data accumulation significant? Additionally, what types of queries confirm this behavior, and how can I verify that only 2% of my data is being scanned?

During my testing, I noticed that when using an aggregate query like sum(weight) for one day, the byte scanned was 50% of my entire dataset, which seems contrary to the expected 2% based on your announcement.

1

There are 1 answers

0
Leeroy Hannigan On

Those figures are clearly identified as assumptions:

  • 2% Scanning of Data: The statement mentions that Amazon Timestream's distributed query engine scans approximately 2% of the data accumulated over the past six hours. This value indicates the level of data scanning or filtering that occurs when you run queries on your data stored in Amazon Timestream. It implies that, with the specific configuration of your dashboard queries and predicates, only about 2% of the data needs to be scanned or examined to provide the desired results. The exact methodology for achieving this level of data reduction would depend on the specific query optimization techniques and indexes used within Amazon Timestream.

  • 6 Hours of Data in Memory Store: Amazon Timestream allows you to configure different storage tiers for your time-series data. In this case, it's configured to store six hours' worth of data in the memory store. The memory store is typically optimized for high-performance and low-latency access, making it suitable for storing recent data that needs to be readily available for querying and dashboard purposes.