How to: Self join Stream Analytics?

716 views Asked by At

Im trying to replace null values with last 10 seconds Average in stream analytics job. this requires a self join between the stream and the averages that i calculate in the With clause. It is giving me duplicates(i get same record twice or thrice)? Any suggestions on whats wrong or how to do it properly? My query is:

WITH MV AS ( Select AVG([Sensor_1]) AS [Sensor_1] From [input] GROUP BY SlidingWindow(second, 10))
SELECT [input].[ID]
,[input].[Timestamp]
,[input].[Result]
,CASE 
    WHEN [input].[Sensor_1] = 0 
    THEN [MV].[Sensor_1] ELSE [input].[Sensor_1] 
    END [Sensor_1]
,[input].[Sensor_2]
,[input].[Sensor_3]
FROM [input]
LEFT OUTER JOIN [MV]
ON DateDiff(second, [input], [MV]) BETWEEN 0 AND 10
1

There are 1 answers

0
Ziv Kasperski On

Sorry for the delay in responding on this.

The simplest solution is to change ON DateDiff(second, [input], [MV]) BETWEEN 0 AND 10 to ON DateDiff(millisecond, [input], [MV]) = 0.

This is because the timestamps given in the MV step are of the last event that went into the SlidingWindow and those would match the timestamp on the event in Input (note: the smaller the time unit the better for the match but if you are using the in-browser-testing-experience then millisecond is the smallest supported unit).

Do note that while here we can remove duplicates by removing needless matches in the JOIN, in general Stream Analytics has no mechanism to remove duplicates via DISTINCT or anything like that.

Ziv.