Im Currently working in a Load forecast based on GRU and LSTM, so far I have this:
- I developed an scrapper to get the newest data (every 5 minutes) of Load Power Systems coming from a government website, to perform it every 5 minutes I use a CRONJOB
- I store the data in a PostgreSQL table using the timestamp and Load (Goverment does not store data so I store it)
- I generated a Flask API with few endpoints and query the data using timestamps (mywebsite.com/National/0/XXXXXX : This is National demand from 0 timestamp to desired day)
- The preprocessing of my model aggregates the data Hourly with polars
- The model uses a Seq2Seq implementation as in Chapter 15 of HOML - Aurelie geron using TensorFlow
- The input window is of 168 hours as input and 24 hours as output, hence the forecast is done once a day only, not continuously
- The Error is acceptable for a multistep forecast (<4%)
- I can generate predictions (As an Array) based on a new data (from API and preprocessing) but I don't have timestamps for this so I plot in X-axis as 1 to 24 hours
The question and what I Have in my mind but can´t solve is how to create a vizualization in a web-app so I can have the expected 24 hours from now (Forecast) and every hour or 5 minutes to update the plot so I can get Error metrics and Visualize the the Error.
So I thought the following:
- To Add a Column in the SQL table to insert the forecast data or to create a new table
- To Change all the SQL database to use datetime index and expand forecast to match incoming data (1 hour sample are 12 5min samples)
The problem with both (from my POV) is that timestamp is my primary key so the forecast module should create the future samples and the scrapper may require to match created index but I dont know future indexes because the scrapper may have a delay of 5-10 seconds and I have the fear of the scrapper failing and lose the data
Which will be a good solution to create future events and fill them correctly, and how can I run tests to ensure that new data will not be lost, along with a visualization of both forecast and incoming data and have the same indices to calculate performance metrics of model
So far:
Tried to generate a new table and match timestamps but didn´t work and at some point I received a NaN from API and the runtime failed to generate all the forecasts and scrapper for a day until I solve it manually