Question on dedicated SQL pool, nodes and parallel number

109 views Asked by At

There is a question:

When your dedicated SQL pool performance level expands to 10 nodes, what is the recommended or optimal number of files to load? A. 6 B. 60 C. 600 D. 300

and it's answer:

C. 600 / 10 = 60, which abides by the law of 60. By default, a dedicated SQL pool has 60 distributions per node, so 600 files can all run in parallel, concurrently, on these 10 nodes. As the performance level of the dedicated SQL pool expands to 10 nodes, it is generally recommended to load around 600 files. Distributing the data across a sufficient number of files allows for parallel processing and optimal utilization of the available resources within the SQL pool, resulting in improved performance and query execution times.

While I found in https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-manage-compute-overview that a dedicated SQL pool is always stored in 60 distributions and the more nodes there are, each node has less distributions allocated. Then the answer is B 60 no matter how many nodes up to 60. I'm confused. Which is right, please?

1

There are 1 answers

0
Arko On BEST ANSWER

Based on your understanding from the Azure documentation and the concept of fixed distributions, the correct answer seems to be: B. 60

Regardless of the number of nodes (up to 60), there are always 60 distributions in total in a dedicated SQL pool. Therefore, to optimize parallel processing, having 60 files (one for each distribution) is ideal. for optimal parallel data loading in Azure Synapse Analytics. This approach ensures that each distribution is effectively utilized.

Why?

Because, when scaling your dedicated SQL pool in Azure Synapse Analytics to 10 nodes, the optimal number of files for data loading is a key consideration for efficient parallel processing. Azure Synapse Analytics, regardless of the number of compute nodes, consistently utilizes 60 distributions. This means that the total number of distributions does not change with an increase in nodes. Instead, these distributions are divided among the available nodes. Therefore, with more nodes, each node handles fewer distributions, but the total remains the same. This approach ensures that each distribution is effectively utilized, leading to enhanced performance in data processing and query execution times.

Reference Document: MS Doc