Randomly generate height using stature data

389 views Asked by At

I have a float with a character's age in months, and an array with average height data.

float characterAge;
private float[,] ageHeightData = new float[217, 10] { 180.5f,149.7416f,151.2611f,153.604f,157.5271f,161.898f,166.2812f,170.2366f,172.6084f,174.1505f },
...

I'd like to randomly generate the character's height based on their age. I'm using these charts from the CDC on average height by age: here

The chart breaks height down into percentiles.

So, for a character aged: 180.5 months (15.5 y), their height averages (in cm.) are displayed as such:

149.7416 (3 percentile) 151.2611 (5 percentile) 153.604 (10 percentile) 157.5271 (25 percentile) 161.898 (50 percentile) 166.2812 (75 percentile) 170.2366 (90 percentile) 172.6084 (95 percentile) 174.1505 (97 percentile)

So, 161.898 is the true "average", 149 is short, 174 is tall, etc. What I'd like to know is how can I use the character's known age and this data to randomly generate numbers in a (relatively) correctly weighted fashion, so that if I generate the number 100 times, I'll generate more "average" heights, and correctly have fewer "short" and "tall" heights, with even fewer "very short" and "very tall" values.

1

There are 1 answers

0
Eric Lippert On

You wish to generate random but non-uniform data, presumably given a source of randomness that is uniform -- say, the Random.NextDouble method that returns a (roughly) uniformly distributed number between 0 and 1.

That's easy. If Q(age, x) is the parameterized-by-age quantile function of your distribution then you can simply pass the double from NextDouble to it as x and that will produce a random number from your desired distribution.

Because you have the percentile data you already know the cumulative distribution curve, or, at least, you have enough points on it to approximate it given an age, so start by doing that. (If instead you had the probability distribution curve, you would integrate it to get the cumulative distribution curve.)

Obviously the cumulative distribution is monotone increasing from zero to one, so you get the quantile function by taking the inverse.

For a longer explanation with pretty pictures, see

https://ericlippert.com/2012/02/21/generating-random-non-uniform-data/

Now, a puzzle for you: given the data in your array, can you go the other way? That is, given the distributions and the height, can you give an accurate distribution of the likely age?