Get percent of column value count compared to same column value count in another table during certain time period grouping by column value

48 views Asked by At

Looking for share of each Agent's rows where task = 'Stalking' in table Stalking compared to occurrences in table Assigned during Min and Max stalkDate in table Stalking.

Table Assigned

Agent task taskDate
dbl07 Stalking 2019-12-01
dbl07 patrol 2020-06-01
dbl07 travel 2020-06-24
dbl07 Stalking 2021-10-01
dbl07 Stalking 2021-12-01
dbl07 Stalking 2022-12-24
dbl07 travel 2022-12-01
mrBourne train 2021-06-06
mrBourne Stalking 2022-06-06
mrSmart Stalking 2018-01-01
mrSmart idle 2021-06-06
mrSmart Stalking 2021-01-01
mrSmart Stalking 2022-06-06
mrEnglish lurk 2022-06-06
mrEnglish jest 2022-06-06

Table Stalking, -- stalkDate min-max = 2020-01-31 - 2022-12-31

Agent stalkDate
dbl07 2020-01-31
dbl07 2022-12-31
mrBourne 2022-06-30
mrSmart 2022-06-30
mrEnglish 2022-06-30
mrPowers 2022-06-06

Result should be:

Agent prcnt
dbl07 66.66
mrBourne 100.00
mrSmart 50.00
mrEnglish 100.00
mrPowers 00.00

note: taskDate in Assigned and stalkDate in Stalking are not in sync, the search is made by selecting MIN and MAX stalkDate from Stalking.

SELECT 
    s.Agent, 
        (CONVERT(DECIMAL(10,2),COUNT(s.Agent))/
        CONVERT(DECIMAL(10,2),COUNT(a.Agent))
        ) * 100 as prcnt
FROM 
    Stalking s
FULL OUTER JOIN 
    Assigned a ON s.Agent = a.Agent
WHERE 
    a.task = 'Stalking' 
    AND a.taskDate >= (SELECT MIN(stalkDate) FROM Stalking)
    AND a.taskDate <= (SELECT MAX(stalkDate) FROM Stalking)
GROUP BY
    s.Agent, a.task, a.taskDate

I have 3 problems

  1. For each Agent query results in prcnt = 100.00
  2. How to handle "divide by zero" in case of mrEnglish and assign him prcnt = 100.00?
  3. How to get prcnt = 00.00 to mrPowers? He has no record in Assigned.
1

There are 1 answers

0
3N1GM4 On

Using your example data:

create table #Assigned
(
    Agent varchar(20),
    task varchar(20),
    taskDate date
)

create table #Stalking
(
    Agent varchar(20),
    stalkDate date
)

insert into #Assigned values
('dbl07',    'Stalking','2019-12-01'),
('dbl07',    'patrol',  '2020-06-01'),
('dbl07',    'travel',  '2020-06-24'),
('dbl07',    'Stalking','2021-10-01'),
('dbl07',    'Stalking','2021-12-01'),
('dbl07',    'Stalking','2022-12-24'),
('dbl07',    'travel',  '2022-12-01'),
('mrBourne', 'train',   '2021-06-06'),
('mrBourne', 'Stalking','2022-06-06'),
('mrSmart',  'Stalking','2018-01-01'),
('mrSmart',  'idle',    '2021-06-06'),
('mrSmart',  'Stalking','2021-01-01'),
('mrSmart',  'Stalking','2022-06-06'),
('mrEnglish','lurk',    '2022-06-06'),
('mrEnglish','jest',    '2022-06-06')

insert into #Stalking values
('dbl07',    '2020-01-31'),
('dbl07',    '2022-12-31'),
('mrBourne', '2022-06-30'),
('mrSmart',  '2022-06-30'),
('mrEnglish','2022-06-30'),
('mrPowers', '2022-06-06')

You can get do something like:

select distinct
    ag.Agent,
    mm.minDate,
    mm.maxDate,
    tot.numRecords,
    isnull(tot.numStalking,0) as numStalking,
    case when tot.numRecords = 0 or isnull(tot.numStalking,0) = 0
        then '0.00%'
        else format(convert(decimal(10,2),isnull(tot.numStalking,0)) / convert(decimal(10,2),tot.numRecords),'P')
    end as prcnt
from 
(select distinct Agent from #Stalking) ag
left join #Assigned a on ag.Agent = a.Agent
outer apply
(
    select 
        min(stalkDate) as minDate,
        max(stalkDate) as maxDate
    from #Stalking s
    where s.Agent = a.Agent
) mm
outer apply
(
    select 
        count(*) as numRecords,
        sum(case when ass.task = 'Stalking' then 1 else 0 end) as numStalking
    from #Assigned ass
    where ass.taskDate >= mm.minDate
    and ass.taskDate <= mm.maxDate
    and a.Agent = ass.Agent
) tot

which gives the output:

Agent minDate maxDate numRecords numStalking prcnt
dbl07 31/01/2020 31/12/2022 6 3 50.00%
mrBourne 30/06/2022 30/06/2022 0 0 0.00%
mrEnglish 30/06/2022 30/06/2022 0 0 0.00%
mrPowers NULL NULL 0 0 0.00%
mrSmart 30/06/2022 30/06/2022 0 0 0.00%

This actually differs from your stated expected output, but looking at your example data, I believe 50% is the correct percentage for dbl07 as they have 6 records in Assigned which fall into their range of dates in Stalking, three of which have the task value of "Stalking".

There might be a more elegant way to do this without OUTER APPLY (perhaps using Window Functions), but this is just the method which came to mind.