SQL Server Data Model and Insert Performance

Question

SQL Server Data Model and Insert Performance

71 views Asked by Jasper At 30 March 2024 at 12:13

I am scraping messages about power plant unavailability and converting them into timeseries and storing them in a SQL Server database.

My current structure is the following:

Messages: publicationDate datetime, messageSeriesID nvarchar, version int, messageId identity

The primary key is on (messageSeriesId, version)
Units: messageId int, area nvarchar, fueltype nvarchar, unitname nvarchar tsId identity

The primary key is on tsId. There is a foreign key relation on tsId between this table and Messages. The main reason for this table is that one message can contain information about multiple power plants.
Timeseries: tsId int, delivery datetime, value decimal

I have a partition scheme based on delivery, each partition contains a month of data. The primary key is on (tsId, delivery) and it's partitioned along the monthly partition scheme. There is a foreign key on tsId to tsId in the Units table.

The Messages and Units tables contain around a million rows each. The Timeseries table contains about 500 million rows.

Now, every time I insert a new batch of data, one row goes into the Messages table, between one and a few (4) go into the Units table, and a lot (up to 100.000s) go into the Timeseries table.

The problem I'm encountering is that inserts into the Timeseries table are too slow (100.000 rows take up to a minute). I already made some improvements on this by setting the fillfactor to 80 instead of 100 when rebuilding the index there. However its still too slow.

And I am a bit puzzled, because the way I understand it is this: every partition contains all rows with delivery in that month, but the primary key is on tsId first and delivery second. So to insert data in this partition, it should simply be placed at the end of the partition (since tsId is the identity column and thus increasing by one every transaction).

The time series that I am trying to insert spans 3 years and therefore 36 partitions. If I, however, create a time series with the same length that falls within a single partition the insert is notable faster (around 1.5 second). Likewise if I create an empty time series table (timeseries_test) with the same structure as the original one, then inserts are also very fast (also for inserting data that spans 3 years). However, querying is done based mainly on delivery, so I don't think partitioning by tsId is a good idea.

If anyone has a suggestion on the structure or methods to improve inserts it would be greatly appreciated.

Create table statements (I changed the order of the primary key on the timeseries table, but it didn't make any difference, in fact it seemed to slow down inserts):

CREATE TABLE [dbo].[remit_messages]
(
    [publicationDate] [datetime2](0) NOT NULL,
    [version] [int] NOT NULL,
    [messageId] [int] IDENTITY(1,1) NOT NULL,
    [messageSeriesId] [nvarchar](36) NOT NULL,

    PRIMARY KEY CLUSTERED ([messageSeriesId] ASC, [version] ASC)
            WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, 
                  IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, 
                  ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

/****** Object:  Index [dbo_remit_messages_messageId]    Script Date: 2024-03-30 13:26:36 ******/
CREATE UNIQUE NONCLUSTERED INDEX [dbo_remit_messages_messageId] 
ON [dbo].[remit_messages] ([messageId] ASC)
       WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, 
             SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, 
             DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, 
             ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

CREATE TABLE [dbo].[remit_units]
(
    [tsId] [int] IDENTITY(1,1) NOT NULL,
    [fuelTypeId] [int] NOT NULL,
    [areaId] [int] NOT NULL,
    [messageId] [int] NOT NULL,
    [unitName] [nvarchar](200) NULL,

    PRIMARY KEY CLUSTERED ([messageId] ASC, [tsId] ASC)
            WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, 
                  IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, 
                  ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
    CONSTRAINT [dbo_remit_tsId] UNIQUE NONCLUSTERED ([tsId] ASC)
            WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, 
                  IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, 
                  ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

/****** Object:  Index [dbo_remit_units_tsid]    Script Date: 2024-03-30 13:30:39 ******/
CREATE NONCLUSTERED INDEX [dbo_remit_units_tsid] 
ON [dbo].[remit_units] ([tsId] ASC)
         WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, 
               SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, 
               ONLINE = OFF, ALLOW_ROW_LOCKS = ON, 
               ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

ALTER TABLE [dbo].[remit_units] WITH CHECK 
    ADD FOREIGN KEY([messageId])
        REFERENCES [dbo].[remit_messages] ([messageId])
            ON UPDATE CASCADE
            ON DELETE CASCADE
GO

CREATE TABLE [dbo].[remit_ts]
(
    [tsId] [int] NOT NULL,
    [delivery] [datetime2](0) NOT NULL,
    [available] [decimal](11, 3) NULL,
    [unavailable] [decimal](11, 3) NULL,

    PRIMARY KEY CLUSTERED ([delivery] ASC, [tsId] ASC)
            WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, 
                  IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, 
                  ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [MonthlyPartitionScheme]([delivery])
) ON [MonthlyPartitionScheme]([delivery])
GO

/****** Object:  Index [idx_remit_ts_delivery_inc]    Script Date: 2024-03-30 13:33:34 ******/
CREATE NONCLUSTERED INDEX [idx_remit_ts_delivery_inc] 
ON [dbo].[remit_ts] ([delivery] ASC)
INCLUDE([tsId], [unavailable], [available]) 
         WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, 
               SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF,
               ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [MonthlyPartitionScheme]([delivery])
GO

ALTER TABLE [dbo].[remit_ts] WITH CHECK 
    ADD FOREIGN KEY([tsId])
        REFERENCES [dbo].[remit_units] ([tsId])
            ON UPDATE CASCADE
            ON DELETE CASCADE
GO

Actual execution plan:

https://www.brentozar.com/pastetheplan/?id=HJJiFFS1R

Original Q&A

TechQA.

SQL Server Data Model and Insert Performance

There are 0 answers

Related Questions in SQL-SERVER

Related Questions in PARTITIONING

Popular Questions

Trending Questions