I have a distribution of ages in a population.
For instance, you can imagine something like this:
Ages <24: 15%
Ages 25-49: 40%
Ages 50-60: 20%
Ages >60: 25%
I don't have the mean and standard deviation for each stratum/age group in the data. I am trying to generate a sample population of 1000 individuals where the generated data matches the distribution of ages shown above.
Let's put this data in a more friendly format:
We can easily sample 1000 rows of the table using the sample function:
To sample actual ages you will need to define a distribution over the ages represented by each row. A simple one would be uniformly distributed ages:
Of course, if uniformly sampling the ages in each range is inappropriate in your application, then you would need to pick some other function to get ages from buckets.