How Azure dedicated pool partition switch work efficiently when data is sharded over 60 distributions

155 views Asked by At

A table contains xyz columns, with 3 years of data.

Index                    = clustered column index
Hash distribution column = product.
Partition column         = date.

As the new year data arrive like 2010, 2011, 2012, 2013(new).

2010 year data must be deleted, using the partition switching.

But Hash will distribute according to similar products data, and when data will be partitioned according to date it will not in same location/container/distribution.

So how partition switch will be fast as DB engine needs to run query parallel in all distributions and it will look for 2010 data in each and then remove it?

Or is there something that I am missing?

1

There are 1 answers

0
DileeprajnarayanThumula On

I have tried to reproduce the solution.

  • I have created two tables, Sales and Sales_New, with the same schema.For partition switching.
  • Partitioned both tables by SaleDate
    attempted to switch partition 4 from Sales_New to Sales.

The below is the code:
Step 1: Create new table.

CREATE  TABLE [dbo].[Sales_New]
(
[SaleID] INT  NOT  NULL,
[ProductID] INT  NOT  NULL,
[SaleDate] DATE  NOT  NULL,
[SaleAmount] DECIMAL(18,2) NOT  NULL,
[SaleQuantity] INT  NOT  NULL,
[SaleLocation] VARCHAR(50) NOT  NULL
)
WITH
(
DISTRIBUTION = HASH([ProductID]),
CLUSTERED COLUMNSTORE INDEX,
PARTITION
([SaleDate] RANGE  RIGHT  FOR  VALUES ('20090101', '20100101', '20110101', '20120101', '20130101', '20140101')
)
);
INSERT  INTO [dbo].[Sales_New]
SELECT *
FROM [dbo].[Sales]
WHERE [SaleDate] >= '20100101'

Step 2: Delete 2010 from Sales table

DELETE  FROM [dbo].[Sales]
WHERE [SaleDate] >= '20100101'
AND [SaleDate] < '20110101';

enter image description here

2010 records in sales_new table.

enter image description here

Switch Partrition:

ALTER  TABLE [dbo].[Sales_New] SWITCH PARTITION  4  TO [dbo].[Sales] PARTITION  4  WITH (TRUNCATE_TARGET = ON);

Partition switch operation between two tables, [dbo].[Sales_New] and [dbo].[Sales], specifically for partition number 4.

The below is the partition logic I have tried.

  1. Partition 1: [SaleDate] < '20090101'
  2. Partition 2: '20090101' <= [SaleDate] < '20100101'
  3. Partition 3: '20100101' <= [SaleDate] < '20110101'
  4. Partition 4: '20110101' <= [SaleDate] < '20120101'
  5. Partition 5: '20120101' <= [SaleDate] < '20130101'

As you want to delete the 2010 values falling between ‘20110101’ and ‘20120101’ would belong to Partition 4.

  • As you mentioned how partition switch will be fast as DB engine This operation is used in scenarios where you want to efficiently move data from one table to another without physically copying it.

  • The partition switch operation between two tables, [dbo].[Sales_New] and [dbo].[Sales], specifically for partition number 4.