Generate population data with specific distribution in R

Question

Generate population data with specific distribution in R

3.5k views Asked by user2657817 At 25 June 2015 at 23:18

I have a distribution of ages in a population.

For instance, you can imagine something like this:

Ages <24: 15%

Ages 25-49: 40%

Ages 50-60: 20%

Ages >60: 25%

I don't have the mean and standard deviation for each stratum/age group in the data. I am trying to generate a sample population of 1000 individuals where the generated data matches the distribution of ages shown above.

Original Q&A

There are 2 answers

oyeoyeoye On 16 July 2018 at 08:54

This doesn't do exactly what you were looking for, but does help with the cut-offs. Hope it helps!

install.packages("truncnorm")
library(truncnorm)

set.seed(123)
pop <- 1000

ages <- rtruncnorm(n=pop, a=0, b=100, mean=40, sd=25) # ---> You can set your own mean and sd

summary(ages)

**josliber** · Accepted Answer · 2015-06-25T23:34:51+00:00

Let's put this data in a more friendly format:

(dat <- data.frame(min=c(0, 25, 50, 60), max=c(25, 50, 60, 100), prop=c(0.15, 0.40, 0.20, 0.25)))
#   min max prop
# 1   0  25 0.15
# 2  25  50 0.40
# 3  50  60 0.20
# 4  60 100 0.25

We can easily sample 1000 rows of the table using the sample function:

set.seed(144)  # For reproducibility
rows <- sample(nrow(dat), 1000, replace=TRUE, prob=dat$prop)
table(rows)
# rows
#   1   2   3   4 
# 139 425 198 238

To sample actual ages you will need to define a distribution over the ages represented by each row. A simple one would be uniformly distributed ages:

age <- round(dat$min[rows] + runif(1000) * (dat$max[rows] - dat$min[rows]))
table(age)
# age
#   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27 
#   2   5   5   3   7   7   9   6   7   6   1   7   7   5   5   6   2   4   6   7   4  11   8   2   3  10  11  13 
#  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55 
#  19  16  20  16  18  21  16  19  14  20  15  13  18  15  24  20  16  16  29  16  11  12  18  17  17  26  27  21 
#  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83 
#  17  26  11  13  20   3   8   9   6   4   3   3   5   4   3   3   5   8   3  13   5   6   4   7   9   9   6   4 
#  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100 
#   5   5   9   9   5   6   8   9   5   4   6   5   9   6   8   4   1

Of course, if uniformly sampling the ages in each range is inappropriate in your application, then you would need to pick some other function to get ages from buckets.

TechQA.

Generate population data with specific distribution in R

There are 2 answers

Related Questions in R

Related Questions in DISTRIBUTION

Related Questions in RANDOM

Related Questions in POPULATION

Popular Questions

Popular Tags

Trending Questions