How to improve query Performance while using SQLCLR user defined aggregate function(UDA)

115 views Asked by At

We find the performance issue while using SQLCLR user defined aggregate function(UDA),

Here are our scenario: We have 2 columns needed to calculate: key and value, whose value would like:

key value
Row_1 a/b/c/d/e 1/2/3/2/1
Row_2 a/b/c/d/e 2/0/1/2/3
Row_3 a/b/c/d/e 2/3/4/1/2

We need an aggregate function to get the sum of each metric, in the upper example, we want to get the result like this:

key value
Result a/b/c/d/e 5/5/8/5/6

there is no native aggregate function we can use to get this kind of result, so we use an SQLCLR UDA to achieve this result. We found that the performance of this UDA is bad when SQLCLR UDA used with a GROUP BY clause.

After some investigation, we find that the reason below:

  1. When we use a SQLCLR UDA, a StreamAggregate operation must be used, and an expensive sort operator will be introduced, decreasing the performance of the UDA.
  2. When we use a SQLCLR UDA, only row mode can be used to calculate the results in the sort and aggregate operator.

So, my question:

  1. Is there any chance for user to force SQL Server to use hash aggregator operator instead of stream operator while using user defined aggregate function?
  2. Is there any chance for user to get ride of the sort operator while using user defined aggregate function?
  3. Is there any chance for user to use batch mode while using user defined aggregate function?
1

There are 1 answers

7
SQLpro On

You will never obtain any performances when the database violate the first normal form... Because this conducts to not having a Relational Database... and a relational engine is especially designed to manipulate quickly relational data but not non relational data.

This is not the problem of UDA performance, but your design which leads to severe counter performances.

First normal form says that a column in a table must ALWAYS have a single (scalar) value. You put a list of values, that violate the first normal form.

Just redesign the database by adding a child table and put the keys and values inside this child table and you will have performance!

You can try this way:

CREATE TABLE T_CHILD
(ID_CHILD       INT IDENTITY PRIMARY KEY,
 ID_ROW         VARCHAR(32), --REFERENCES T_FATHER (ID_ROW),
 KEY_CHILD      VARCHAR(16),
 VALUE_CHILD    INT);
GO

INSERT INTO T_CHILD
SELECT ID_ROW, k.value, V_EMP.value
FROM   T_FATHER
       OUTER APPLY STRING_SPLIT([key], '/') AS k
       OUTER APPLY STRING_SPLIT([value], '/') AS v;