My data is longitudinal.
VISIT ID VAR1
1 001 ...
1 002 ...
1 003 ...
1 004 ...
...
2 001 ...
2 002 ...
2 003 ...
2 004 ...
Our end goal is picking out 10% each visit to run a test. I tried to use proc SURVEYSELECT to do SRS without replacement and using "VISIT" as strata. But the final sample would have duplicated IDs. For example, ID=001 might be selected both in VISIT=1 and VISIT=2.
Is there any way to do that using SURVEYSELECT or other procedure (R is also fine)? Thanks a lot.
This is possible with some fairly creative data step programming. The code below uses a greedy approach, sampling from each visit in turn, sampling only ids that have not previously been sampled. If more than 90% of the ids for a visit have already been sampled, less than 10% are output. In the extreme case, when every id for a visit has already been sampled, no rows are output for that visit.