I have a simple data set of customers (about 40,000k) It looks like:
customerid, group, other_variable
a,blue,y
b,blue,x
c,blue,z
d,green,y
e,green,d
f,green,r
g,green,e
I want to randomly select for each group, Y amounts of customers (along with their other variable(s). The catch is, i want to have two random selections of Y amounts for each group i.e.
4000 random green customers split into two sets of 2000 randomly
and 4000 random blue customers split into two sets of 2000 randomly
This is because I have different messages to give to the two different splits I'm not sampling with replacement. Needs to be unique customers
Would prefer a solution in PROC SQL but happy for alternative solution in sas if proc sql isn't idea
proc surveyselect
is the general tool of choice for random sampling in SAS. The code is very simple, I would just sample 4000 of each group, then assign a new subgroup every 2000 rows, since the data is in a random order anyway (although sorted by group).The default sampling method for
proc surveyselect
issrs
, which is simple random sampling without replacement, exactly what is required here.Here's some example code.