How to take a Probability Proportional to Size (PPS) Unequal Probability sample using R?

1.5k views Asked by At

I have very little programming experience, but I'm working on a statistics project and would like to generate an unequal probability sample where the inclusion probability of a unit is based on its size (PPS).

Basically, I have two datasets:

  • ds1 lists US states and the parameter I'm trying to estimate
  • ds2 has the population size of each state.

My questions:

  1. I want to use R to select a random sample from the first dataset using inclusion probabilities based on the population of each state (second dataset).

  2. Also is there any way to use R to calculate these Generalized Unequal Probability Estimator formulas?

Generalized Unequal Probability Estimator Estimated Variance of Generalized Unequal Probability Estimator

Also just a note on the formulas: pi_i is inclusion probability and pi_ij is joint inclusion probability.

2

There are 2 answers

0
smci On

Yes, that's called weighted sampling. Simply set the weight to the size of the state, strictly you don't even need to normalize them by 1/sum(sizes) although it's always good practice to. There are tons of duplicate posts on SO showing how to do weighted sampling.

The only tiny complication is that you need to do a join() of the datasets ds1, ds2. Show us what code you've tried if it's causing problems. Recommend you use either dplyr or data.table.

Your second question should be asked as a separate question, and is offtopic on SO, or at least won't get a great response - best to ask statistical questions at sister site CrossValidated

0
bsrcube On

There is a package for the same in R - pps and the documentation is here.

Also, there is another package called survey with a bit of documentation here.

I'm not sure of the difference between the two and haven't used them myself. Hope this is what you're looking for.