I humbly request you to help me with the below scenario, In my project I have to process time series data from a table. We are using Azure SQL Server.
The table dbo.batch_events
has 1400 million rows, please see the below screenshot for the table structure and sample data:
Source table structure and sample data
I have to pivot the equipment name from the equipment_name
column in the dbo.batch_events
table into column names and load the pivoted value into dbo.time_series
table .
If I pivot the equipment name column, I am getting 685 columns to be created in the dbo.time_series
table.
Please see the below screenshot showing the target table (dbo.time_series
) structure and expected output for the sample data shown in the source table.
Target table structure and sample data
Please kindly advise what is the best approach and method to write the query in SQL.
The query I wrote is taking 25 hours to process the 1400 million records and load them into the target table.
I have created day wise partition on time_stamp column for the source table (dbo.batch_events
) and created two nonclustered indexes - one on equipment name and the other on time stamp column.
I humbly request you to advise me on the best approach to write the query for this scenario.
The stored procedure I have created process one month of data at a time; for one month, we have around 120 million rows to deal with.
Used
While
loop to put entry forstart_date
andend_date
for each month intodbo.Iteration_ctrl
table, by getting the minimum and maximum date from thedbo.Batch_events
table. So, I have 12 entries in this table, each entry representing one month.Used
while
loop to iterate through the 12start_date
andend_date
entries in thedbo.Iteration_ctrl
table, and used pivot query inside the while loop to load the data intodbo.Time_series
table.
Please see the stored procedure I have written, which is taking 25 hours (which seems to me inefficient). Any help would be appreciated please.
DECLARE @MIN_TIME DATETIME, @MIN_TIMESTAMP DATETIME;
DECLARE @MAX_TIME DATETIME, @MAX_TIMESTAMP DATETIME;
DECLARE @DATE DATETIME, @ROWCOUNT INT, @TOTALCOUNT INT;
SELECT @MIN_TIME = MIN(Time_stamp)
FROM [dbo].[BATCH_EVENTS]
SELECT @MAX_TIME = MAX(Time_stamp)
FROM [dbo].[BATCH_EVENTS]
PRINT 'INSERT INTO TABLE [dbo].[ITERATION_CTRL] HAS STARTED ' + CAST(GETDATE() AS nvarchar(30))
WHILE @MIN_TIME < @MAX_TIME
BEGIN
SELECT @DATE = DATEADD(MM, 01, @MIN_TIME)
SELECT @DATE = CASE WHEN @DATE > @MAX_TIME THEN @MAX_TIME ELSE @DATE END
INSERT INTO dbo.ITERATION_CTRL
SELECT @MIN_TIME, @DATE
PRINT 'INSERTION INTO TABLE [dbo].[ITERATION_CTRL] HAS ENDED FOR'+ CAST(@MIN_TIME AS nvarchar(30)) + ' -' + CAST( @DATE AS nvarchar(30)) +' NO OF ROWS INSERTED :'+ CAST( @@ROWCOUNT AS nvarchar(30)) +' ' + CAST(GETDATE() AS nvarchar(30))
SELECT @MIN_TIME = DATEADD(SS, 01, @DATE)
END
PRINT 'INSERT INTO TABLE [dbo].[Time_series_data] HAS STARTED ' + CAST(GETDATE() AS nvarchar(30))
SELECT @TOTALCOUNT = COUNT(*) FROM dbo.ITERATION_CTRL
SELECT @ROWCOUNT = 1
WHILE @ROWCOUNT <= @TOTALCOUNT
BEGIN
SELECT
@MIN_TIMESTAMP = MIN_DATE,
@MAX_TIMESTAMP = MAX_DATE
FROM dbo.ITERATION_CTRL
WHERE ID = @ROWCOUNT
BEGIN TRANSACTION
INSERT INTO dbo.Time_series_data
SELECT *
FROM
(SELECT
[Event_name], [Time_Stamp],
[Start_time], [End_time], [Duration],
[Value] AS [Sensor_Value],
Equipment_name
FROM
[dbo].[BATCH_EVENTS] BE
WHERE
Time_stamp >= [Start_time] AND Time_stamp <= [End_time]
AND Time_stamp BETWEEN @MIN_TIMESTAMP AND @MAX_TIMESTAMP) t
PIVOT
(MAX([Sensor_Value])
FOR Equipment_Name IN ([MY1102], [MY1138], [MY1180],
[MY1164], [MY1176], [MY204],
[MY324], [MY64B6])
ORDER BY
[Time_Stamp], [Event_name]
COMMIT TRANSACTION
SELECT @ROWCOUNT = @ROWCOUNT + 1
--PRINT @MIN_TIMESTAMP, @MAX_TIMESTAMP
PRINT 'INSERTION INTO TABLE [Time_series_data] HAS ENDED NO OF ROWS INSERTED :'+ CAST( @@ROWCOUNT AS nvarchar(30)) +' For duration ' + CAST( @MIN_TIMESTAMP AS nvarchar(30))+ ' '+ CAST( @MAX_TIMESTAMP AS nvarchar(30))+' Time '+ CAST(GETDATE() AS nvarchar(30));
END
END