Quicksight: fill missing days

2.3k views Asked by At

We have RAW DATA that shows value changes over time (per "machine" in our example). There is one row if value ("not_processed" column) has changed on certain day.

I assume that many others have similar need when working with data that changes over time. We would like to go with Option 1 if just possible. If that doesn't work then we are probably forced to go with option 2.

Our plan: Option 1.
We would like to have a Quicksight pivot table showing the changes over time. There are not separate records for all days. So, the logic should be that previous given value should be taken into consideration in pivot table.

Here is example how the RAW DATA looks like and how we would like the pivot table to look like enter image description here

Is there any way on Quicksight-side to handle this situation and provide pivot-table as shown in the sample above?

Backup plan: Option 2
Of course one option is to create one row per day and machine as RAW DATA. In other words, fill missing days in RAW DATA. However, it means that we have to create lot of similar records just for Quicksight report to work. The amount of data would increase a lot.

If we are forced to go with this plan, it would be good to have some kind of tool which would "extend" our RAW DATA.

Are there any tools on AWS-side which could be used to "fill missing days" in RAW DATA? If so, how? Any samples would be very beneficial.

EDIT If I add the data on Quicksight pivot, it does not show missing days. See: enter image description here

1

There are 1 answers

2
amsh On

[Edit: This solution needs a prerequisite due to new information]

You can go with option 1 and create a pivot table. For a pivot table to work you will have to choose at least one column (in your case it would be machine) and choose the pivot table icon.

More details on AWS official blog.

[Prerequisite]

The file needs to include the missing data to make pivot table work as expected. The option 2 is to use some tool from AWS to do that. The tool used run the script (in pandas, petl etc.) for inserting the missing data can be

  • AWS Lambda with any of the supported runtime, scheduled at 11:55 PM.
  • AWS Glue (Scheduled) with PySpark, if data is really big.
  • Scheduled ECS Fargate (that supports any runtime, due to docker), for medium data size that may need more than 15 minutes to process (Lambda timeout)

I would not recommend any of these, because there will be a development overhead and you will have to keep the files on S3 (If they are not on it now).

Recommended: You can simply update your code to check if last row added for that machine is last day, if not add dummy rows for all the missing days before you add record for current day. You will have to include a starting date in configurations in case the file is empty.