How can I merge two select queries with counts of mostly identical columns but one set returns multiple rows

58 views Asked by At

I have two queries that work perfectly:

DECLARE @StartDate DATETIME = '2021-11-01 00:00:00';
DECLARE @EndDate DATETIME = '2022-03-16 23:59:59';
DECLARE @SalesEstimateTransactionTypeId INT = 16;
DECLARE @SalesOrderTransactionTypeId INT = 15;

SELECT 
    DATENAME(mm, GeneralJournal.[TransactionDate]) AS ReportingMonth, 
    DATEPART(mm, GeneralJournal.[TransactionDate]) AS MonthNumber, 
    DATEPART(yyyy, GeneralJournal.[TransactionDate]) AS ReportingYear,
    COUNT(TransactionId) AS TransactionCount
FROM 
    GeneralJournal 
WHERE 
    GeneralJournal.[TransactionDate] >= @StartDate 
    AND GeneralJournal.[TransactionDate] <= @EndDate 
    AND MasterRecord = 1 
    AND TransactionTypeId = @SalesEstimateTransactionTypeId
GROUP BY 
    DATEPART(yyyy, GeneralJournal.[TransactionDate]), 
    DATEPART(mm, GeneralJournal.[TransactionDate]),  
    DATENAME(mm,GeneralJournal.[TransactionDate]);

SELECT 
    DATENAME(mm, GeneralJournal.[TransactionDate]) AS ReportingMonth, 
    DATEPART(mm, GeneralJournal.[TransactionDate]) AS MonthNumber, 
    DATEPART(yyyy, GeneralJournal.[TransactionDate]) AS ReportingYear, 
    COUNT(DISTINCT TransactionId) AS ConversionCount
FROM 
    GeneralJournal 
WHERE 
    GeneralJournal.[TransactionDate] >= @StartDate 
    AND GeneralJournal.[TransactionDate] <= @EndDate 
    AND MasterRecord = 0 
    AND TransactionTypeId = @SalesOrderTransactionTypeId 
    AND SEReferenceId > 0
GROUP BY 
    DATEPART(yyyy, GeneralJournal.[TransactionDate]), 
    DATEPART(mm, GeneralJournal.[TransactionDate]),  
    DATENAME(mm,GeneralJournal.[TransactionDate]);

Note that the second query returns distinct because it can return multiple values and we only want to count each TransactionId once in that scenario. These return the following results:

ReportingMonth MonthNumber ReportingYear TransactionCount
November 11 2021 82
December 12 2021 49
January 1 2022 64
February 2 2022 67
March 3 2022 49
ReportingMonth MonthNumber ReportingYear ConversionCount
November 11 2021 42
December 12 2021 27
January 1 2022 31
February 2 2022 50
March 3 2022 24

I actually need to combine them like this:

ReportingMonth MonthNumber ReportingYear TransactionCount ConversionCount
November 11 2021 82 42
December 12 2021 49 27
January 1 2022 64 31
February 2 2022 67 50
March 3 2022 49 24

I have tried pretty much everything I can think of - Unions, Joins, Subqueries - but so far nothing is quite right. This is the closest I can get:

SELECT 
    DATENAME(mm, GeneralJournal.[TransactionDate]) AS ReportingMonth, 
    DATEPART(mm, GeneralJournal.[TransactionDate]) AS MonthNumber, 
    DATEPART(yyyy, GeneralJournal.[TransactionDate]) AS ReportingYear, 
    SUM(CASE 
            WHEN TransactionTypeId = @SalesEstimateTransactionTypeId 
                 AND MasterRecord = 1 
               THEN 1 ELSE 0 
        END) AS TransactionCount, 
    COUNT(CASE 
              WHEN TransactionTypeId = @SalesOrderTransactionTypeId  
                   AND SEReferenceId > 0 THEN 1 
          END) AS ConversionCount
FROM 
    GeneralJournal 
WHERE 
    GeneralJournal.[TransactionDate] >= @StartDate 
    AND GeneralJournal.[TransactionDate] <= @EndDate 
    AND TransactionTypeId IN (@SalesOrderTransactionTypeId, @SalesEstimateTransactionTypeId)
GROUP BY 
    DATEPART(yyyy, GeneralJournal.[TransactionDate]), 
    DATEPART(mm, GeneralJournal.[TransactionDate]),    
    DATENAME(mm,GeneralJournal.[TransactionDate]);

However, I am unable to find a way to get a Distinct value for the ConversionCount. As a result it is returning the full count:

ReportingMonth MonthNumber ReportingYear TransactionCount ConversionCount
November 11 2021 82 152
December 12 2021 49 67
January 1 2022 64 101
February 2 2022 67 136
March 3 2022 49 64

Can anyone guide me towards a way to combine the two query results whilst maintaining the Distinct on the conversion count? I must add that for it to work the answer must be compatible with both SQL Server and VistaDB the syntax of which is a subset of T-SQL because I am obliged to support both database engines with the same query.

EDIT - The Final Solution

Following on from Nick's excellent answer I was able embed the solution into my existing query code to ensure that there are results even for months with no records, shown here in case it helps anyone else:

DECLARE @StartDate DATETIME = '2021-11-01T00:00:00';
DECLARE @EndDate DATETIME = '2022-10-31T23:59:59';
DECLARE @SalesEstimateTransactionTypeId INT = 16;
DECLARE @SalesOrderTransactionTypeId INT = 15;

DECLARE @CurrentDate DATETIME;
DECLARE @Months TABLE(ReportingYear INT, MonthNumber INT, ReportingMonth VARCHAR (40));

-- Set the initial date
SET @CurrentDate = @StartDate
-- insert all dates into temp table
WHILE @CurrentDate <=  @EndDate
BEGIN
    INSERT INTO @Months VALUES(DATEPART(year, @CurrentDate), DATEPART(month, @CurrentDate), DATENAME(mm, @CurrentDate))
    SET @CurrentDate = dateadd(mm, 1, @CurrentDate)
END;

SELECT ReportingMonth, ReportingYear, Coalesce(TransactionCount, 0) AS TransactionCount, Coalesce(ConversionCount,0) AS ConversionCount
FROM
(
    SELECT months.[ReportingMonth], months.[ReportingYear], conversionData.[TransactionCount], conversionData.[ConversionCount]
    FROM @Months months
    LEFT JOIN
    (
        SELECT
        ReportingMonth      = DATENAME(mm, GeneralJournal.[TransactionDate]),
        MonthNumber         = DATEPART(mm, GeneralJournal.[TransactionDate]),
        ReportingYear       = DATEPART(yyyy, GeneralJournal.[TransactionDate]),
        TransactionCount    = SUM(CASE WHEN TransactionTypeId = @SalesEstimateTransactionTypeId AND GeneralJournal.[MasterRecord] = 1 THEN
                                        1
                                    ELSE
                                        0
                                END
                            ),
        ConversionCount     = COUNT(DISTINCT CASE WHEN GeneralJournal.[TransactionTypeId] = @SalesOrderTransactionTypeId
                                        AND GeneralJournal.[SEReferenceId] > 0
                                        AND GeneralJournal.[MasterRecord] = 0 THEN
                                        GeneralJournal.[TransactionID]
                                END
                            )
        FROM GeneralJournal
        WHERE GeneralJournal.[TransactionDate] >= @StartDate
            AND GeneralJournal.[TransactionDate] <= @EndDate
            AND GeneralJournal.[TransactionTypeId] IN ( @SalesOrderTransactionTypeId, @SalesEstimateTransactionTypeId)
        GROUP BY
            DATEPART(yyyy, GeneralJournal.[TransactionDate]),
            DATEPART(mm, GeneralJournal.[TransactionDate]),
            DATENAME(mm, GeneralJournal.[TransactionDate])
    ) as conversionData
    ON months.[ReportingYear] = conversionData.[ReportingYear] AND months.[MonthNumber] = conversionData.[MonthNumber]
) AS data;
2

There are 2 answers

2
Nick Fotopoulos On BEST ANSWER

Your second query that is close, I think just has a couple of minor omissions.

  1. You forgot MasterRecord = 0 in your ConversionCount CASE statement.
  2. Instead of returning 1 or 0 from your ConversionCount CASE you should return TransactionID or NULL so you can still count distinct values.
  3. You are missing DISTINCT inside of your ConversionCount COUNT.
  4. You will need to handle NULL values in the ConversionCount COUNT. I assumed you will always have at one or more NULLs, so I just subtract 1 from the COUNT(DISTINCT ...) to compensate.

(I can't be 100% on the syntax here without some example detail data to work with.)

Code

SELECT
    ReportingMonth      = DATENAME(mm, GeneralJournal.TransactionDate),
    MonthNumber         = DATEPART(mm, GeneralJournal.TransactionDate),
    ReportingYear       = DATEPART(yyyy, GeneralJournal.TransactionDate),
    TransactionCount    = SUM(CASE
                                WHEN TransactionTypeId = @SalesEstimateTransactionTypeId
                                    AND MasterRecord = 1 THEN
                                    1
                                ELSE
                                    0
                            END
                        ),
    ConversionCount     = COUNT(DISTINCT CASE
                                WHEN TransactionTypeId = @SalesOrderTransactionTypeId
                                    AND SEReferenceId > 0
                                    AND MasterRecord = 0 THEN
                                    TransactionID
                                ELSE
                                    NULL
                            END
                        ) - 1 /* Subtract 1 for the NULL */
FROM    GeneralJournal
WHERE
    GeneralJournal.TransactionDate >= @StartDate
    AND GeneralJournal.TransactionDate <= @EndDate
    AND TransactionTypeId IN (
            @SalesOrderTransactionTypeId,
            @SalesEstimateTransactionTypeId
        )
GROUP BY
    DATEPART(yyyy, GeneralJournal.TransactionDate),
    DATEPART(mm, GeneralJournal.TransactionDate),
    DATENAME(mm, GeneralJournal.TransactionDate);
2
Charlieface On

You can just put the two columns in the same query. It is made more complicated by the fact that the WHERE clauses are slightly different. SO you need to group, then group again, and use conditional aggregation to count the right rows for each column.

Note the following:

  • You could in theory do COUNT(DISTINCT CASE however that is normally slower as the compiler will not recognize what the CASE is doing and instead do a full sort.
  • It is faster to group by a single EOMONTH calculation to group by a whole month. You can pull out the year and month in the SELECT.
  • COUNT(TransactionId) will return the number of non-null TransactionId values. if TransactionId cannot be null then COUNT(*) is the same thing.
  • If TransactionDate has a time component then you should use a half-open interval >= AND <
  • Use aliases on tables, it makes your queries more readable.
  • Use whitepsace, it's free.
DECLARE @StartDate DATETIME = '2021-11-01T00:00:00';
DECLARE @EndDate DATETIME = '2022-03-17T00:00:00';
DECLARE @SalesEstimateTransactionTypeId INT = 16;
DECLARE @SalesOrderTransactionTypeId INT = 15;

SELECT
  DATENAME(month, gj.mth) AS ReportingMonth,
  DATEPART(month, gj.mth) AS MonthNumber,
  DATEPART(year , gj.mth) AS ReportingYear,
  SUM(TransactionCount) AS TransactionCount,
  COUNT(CASE WHEN ConversionCount > 0 THEN 1 END) AS ConversionCount
FROM (
    SELECT
      EOMONTH(gj.TransactionDate) AS mth,
      gj.TransactionId,
      COUNT(CASE WHEN gj.MasterRecord = 1 AND gj.TransactionTypeId = @SalesEstimateTransactionTypeId THEN 1 END) AS TransactionCount,
      COUNT(CASE WHEN gj.MasterRecord = 0 AND gj.TransactionTypeId = @SalesOrderTransactionTypeId AND gj.SEReferenceId > 0 THEN 1 END) AS ConversionCount
    FROM GeneralJournal gj
    WHERE gj.TransactionDate >= @StartDate
      AND gj.TransactionDate <  @EndDate
      AND gj.TransactionTypeId IN (@SalesOrderTransactionTypeId, @SalesEstimateTransactionTypeId)
    GROUP BY
      EOMONTH(gj.TransactionDate),
      TransactionId
) g
GROUP BY
  mth;