I have a project that uses an event hub to receive data, this is sent every second, the data is received by a website using SignalR, this is all working fine, i have been storing the data in to blob storage via a Stream Analytics Job, but this is really slow to access, and with the amount of data i am receiving off just 6 devices, it will get even slower as this increases, i need to access the data to display historical data on via graphs on the website, and then this is topped up with the live data coming in.
I don't really need to store the data every second, so thought about only storing it every 30 seconds instead, but into a SQL DB, what i am trying to do, is still receive the data every second but only store it every 30, i have tried a tumbling window, but from what i can see, this just dumps everything every 30 seconds instead of the single entries.
am i miss understanding the Tumbling, Sliding and Hopping windows, i am guessing i cannot use them in this way ? if that is the case, i am guessing the only way to do it, would be to have the output db as an input, so i can cross reference the timestamp with the current time ?
unless anyone has any other ideas ? any help would be appreciated.
Thanks
You are correct that this will put all events within the Tumbling/Sliding/Hopping window together. However, this is only valid within a group by case, which requires a aggregate function over this group.
There is a aggregate function
Collect()
which will create an array of the events within a group.I think this should be possible when you group every event within a 30 second tumbling window using
Collect()
, then in the next step,CROSS APPLY
each record, which should output all received events within the 30 seconds.