Need help understanding the data structure for AutoML

41 views Asked by At

I have an IoT device that updates an Azure Storage Table anytime one of its values changes. For example, If the fish tank temperature changes from 68 to 69, that gets logged. If the filter pump runs, that gets logged. When the little treasure chest opens and bubbles come out, that gets logged. This makes my tabular data look like this:

TimeStamp  Name                 Value
(time)     TreaureChestBubbles  2.8
(time)     TreaureChestBubbles  5
(time)     FilterPumpRunning    1
(time)     TreaureChestBubbles  3.5
(time)     FilterPumpRunning    0
(time)     WaterTemp            66
(time)     TreaureChestBubbles  -1 (indicating an error)

I want to create a model that predicts when my little treasure chest is going to fail.

I dumped all this data into an AutoML job and clicked go...and it failed miserably. Then I started reading the documentation. I find lots of documentation talking about setting up experiments, but very little concerning the exact structure of the data. It looks like my tabular data needs to have EVERY parameter in each row? So instead of a Name column, I'd need a TreaureChestBubblesValue column, a WaterTempValues column, a FilterPumpRunningValues, etc.

TimeStamp TreaureChestBubblesValue WaterTempValues ... FilterPumpRunningValues
(time)    2.8                      67                  0
(time)    5                        67                  0
(time)    5                        66                  0
(time)    8.4                      66                  1
(time)    2.8                      67                  0

Does that sound correct? Or does the structure of the data not matter for AutoML so long as its tabular?

1

There are 1 answers

0
Tim On BEST ANSWER

Per this link: https://learn.microsoft.com/en-us/azure/machine-learning/concept-automl-forecasting-methods#how-automl-uses-your-data

AutoML accepts time series data in tabular, "wide" format; that is, each variable must have its own corresponding column. AutoML requires one of the columns to be the time axis for the forecasting problem. This column must be parsable into a datetime type. The simplest time series data set consists of a time column and a numeric target column. The target is the variable one intends to predict into the future. The following is an example of the format in this simple case:

timestamp   quantity
2012-01-01  100
2012-01-02  97
2012-01-03  106
...         ...
2013-12-31  347

In more complex cases, the data may contain other columns aligned with the time index.

timestamp    SKU    price  advertised  quantity
2012-01-01   JUICE1 3.5    0           100
2012-01-01   BREAD3 5.7    60          47
2012-01-02   JUICE1 3.5    0           97
2012-01-02   BREAD3 5.5    1           68
...          ...    ...    ...         ...
2013-12-31   JUICE1 3.7    50          347
2013-12-31   BREAD3 5.7    0           94

In this example, there's a SKU, a retail price, and a flag indicating whether an item was advertised in addition to the timestamp and target quantity. There are evidently two series in this dataset - one for the JUICE1 SKU and one for the BREAD3 SKU; the SKU column is a time series ID column since grouping by it gives two groups containing a single series each. Before sweeping over models, AutoML does basic validation of the input configuration and data and adds engineered features.