SAS how to get random selection by group randomly split into multiple groups

Question

SAS how to get random selection by group randomly split into multiple groups

7.4k views Asked by shecode At 11 June 2015 at 00:28

I have a simple data set of customers (about 40,000k) It looks like:

customerid, group, other_variable
a,blue,y
b,blue,x
c,blue,z
d,green,y
e,green,d
f,green,r
g,green,e

I want to randomly select for each group, Y amounts of customers (along with their other variable(s). The catch is, i want to have two random selections of Y amounts for each group i.e.

4000 random green customers split into two sets of 2000 randomly
and 4000 random blue customers split into two sets of 2000 randomly

This is because I have different messages to give to the two different splits I'm not sampling with replacement. Needs to be unique customers

Would prefer a solution in PROC SQL but happy for alternative solution in sas if proc sql isn't idea

Original Q&A

There are 3 answers

yukclam9 On 11 June 2015 at 01:43

data custgroup ;
do i=1 to nobs;
set sorted_data nobs=nobs ;
point = ranuni(0);
end;

proc sort data = custgroup out=sortedcust
by group point;
run;

data final;
set sortedcust;
by group point;
if first group then i=1;
i+1;
run;

Basically what I am doing is first assign a random number to all observations in the data set. Then perform sorting based on the variable group and point.

Now I achieved a random sequence of observation within group. i=1 and i+1 would be to identify the row of observation(s) within group. This means would avoid extracting duplicated observations . Use output statement as well to control where you want to store the observation based on i.

My approach may not be the most efficient one.

Oliver On 11 June 2015 at 14:56

The code below should do it. First, you will need to generate a random number. As Joe said above, it is better to seed it with a specific number so that you can reproduce the sample if necessary. Then you can use Proc Sql with the outobs statement to generate a sample.

(BTW, it would be a good idea not to name a variable 'group'.)

data YourDataSet;
set YourDataSet;
myrandomnumber = ranuni(123);
run;

proc sql outobs=2000;
create table bluesample as
select *
from YourDataSet
where group eq 'blue'
order by myrandomnumber;
quit;

proc sql outobs=2000;
create table greensample as
select *
from YourDataSet
where group eq 'green'
order by myrandomnumber;
quit;

**Longfish** · Accepted Answer · 2015-06-11T11:26:21+00:00

proc surveyselect is the general tool of choice for random sampling in SAS. The code is very simple, I would just sample 4000 of each group, then assign a new subgroup every 2000 rows, since the data is in a random order anyway (although sorted by group).

The default sampling method for proc surveyselect is srs, which is simple random sampling without replacement, exactly what is required here.

Here's some example code.

/* create dummy dataset */
data have;
do customerid = 1 to 10000;
length group other_variable $8;
if rand('uniform')<0.5 then group = 'blue'; /* assign blue or green with equal likelihood */
    else group = 'green';
other_variable = byte(97+(floor((1+122-97)*rand('uniform')))); /* random letter between a and z */
output;
end;
run;

/* dataset must be sorted by group variable */
proc sort data=have;
by group;
run;

/* extract random sample of 4000 from each group */
proc surveyselect data=have
                    out=want
                    n=4000
                    seed=12345; /* specify seed to enable results to be reproduced */
strata group; /* set grouping variable */
run;

/* assign a new subgroup for every 2000 rows */
data want;
set want;
sub=int((_n_-1)/2000)+1;
run;

TechQA.

SAS how to get random selection by group randomly split into multiple groups

There are 3 answers

Related Questions in SQL

Related Questions in SAS

Related Questions in SAMPLE

Related Questions in PROC

Popular Questions

Popular Tags

Trending Questions