Taking Sample in SQL Query

1.5k views Asked by At

I'm working on a problem which is something like this :

I have a table with many columns but major are DepartmentId and EmployeeIds

Employee Ids    Department Ids
------------------------------
A                   1
B                   1
C                   1
D                   1
AA                  2
BB                  2
CC                  2
A1                  3
B1                  3
C1                  3
D1                  3

I want to write a SQL query such that I take out 2 sample EmployeeIds for each DepartmentID.

like

Employee Id  Dept Ids
B              1
C              1
AA             2
CC             2
D1             3
A1             3

Currently I am writing the query,

select
   EmployeeId, DeptIds, count(*)
from 
   table_name
group by 1,2
sample 2

but it gives me total two rows.

Any help?

1

There are 1 answers

3
dnoeth On BEST ANSWER

If the number of departments i know and small you could do a stratified sampling:

select *
from table_name
sample
   when DeptIds = 1 then 2
   when DeptIds = 2 then 2
   when DeptIds = 3 then 2
end

Otherwise a combination of RANDOM and ROW_NUMBER:

select *
from
 (
   sel EmployeeId, DeptIds, random(1,10000000) as rand
   from table_name
 ) as dt
qualify
   row_number()
   over (partition by DeptIds
         order by rand) <= 2