I have a dateset that looks something like this
library(tidyverse)
library(TeachingSampling)
data(Lucy)
Lucy <- Lucy %>% select(ID,Level,Zone,Employees,SPAM)
head(Lucy)
ID Level Zone Employees SPAM
1 AB001 Small A 41 no
2 AB002 Small A 19 yes
3 AB003 Small A 68 no
4 AB004 Small A 89 no
5 AB005 Small A 91 yes
I would like to draw a random sample with probability proportional to the size where my size is Employees
.
In doing so I would like to stratify by two variables: Level
and SPAM
. In doing so I would like to do use proportional allocation so that the sample size of each stratum is proportional to the population size of the stratum.
I know how to do this with one strata, but I do not know how to do it with two strata
N <- sum(Lucy$Employees) #pop size
Lucy %>% group_by(Level) %>%
summarise(insamp=round(100*sum(Employees)/N)) # calculate number of people sampled in each strata
summary(Level)
Level insamp
<fct> <dbl>
1 Big 8
2 Medium 39
3 Small 53
# Defines the sample size at each stratum
mh<-c(8,39,53)
# Draws a stratified sample
res<-S.STPPS(Level, Employees, mh)
# The selected sample
ind<-res[,1]
# The information about the units in the sample is stored in an object called data
samp <- Lucy[find,]
# check
samp %>% group_by(Level) %>%
summarise(n=n())
# A tibble: 3 × 2
Level n
<fct> <int>
1 Big 8
2 Medium 39
3 Small 53
Does anyone knows how I can do this, combining a series of strata?
so that create a sample that not only is stratified by Level
but also by SPAM
?
In this case my sample should also have 39 observations sin SPAM and 39 not.
Lucy %>% group_by(SPAM) %>%
summarise(insamp=round(100*sum(Employees)/N))
# SPAM insamp
<fct> <dbl>
1 no 39
2 yes 61