This question is related to: Selecting Percentile curves using gamlss::lms in R
I can get centile curve from following data and code:
age = sample(5:15, 500, replace=T)
yvar = rnorm(500, age, 20)
mydata = data.frame(age, yvar)
head(mydata)
age yvar
1 12 13.12974
2 14 -18.97290
3 10 42.11045
4 12 27.89088
5 11 48.03861
6 5 24.68591
h = lms(yvar, age , data=mydata, n.cyc=30)
centiles(h,xvar=mydata$age, cent=c(90), points=FALSE)
How can I now get yvar on the curve for each of x value (5:15) which would represent 90th percentiles for data after smoothening?
I tried to read help pages and found fitted(h) and fv(h) to get fitted values for entire data. But how to get values for each age level at 90th centile curve level? Thanks for your help.
Edit: Following figure show what I need:
I tried following but it is correct since value are incorrect:
mydata$fitted = fitted(h)
aggregate(fitted~age, mydata, function(x) quantile(x,.9))
age fitted
1 5 6.459680
2 6 6.280579
3 7 6.290599
4 8 6.556999
5 9 7.048602
6 10 7.817276
7 11 8.931219
8 12 10.388048
9 13 12.138104
10 14 14.106250
11 15 16.125688
The values are very different from 90th quantile directly from data:
> aggregate(yvar~age, mydata, function(x) quantile(x,.9))
age yvar
1 5 39.22938
2 6 35.69294
3 7 25.40390
4 8 26.20388
5 9 29.07670
6 10 32.43151
7 11 24.96861
8 12 37.98292
9 13 28.28686
10 14 43.33678
11 15 44.46269
See if this makes sense. The 90th percentile of a normal distribution with mean and sd of 'smn' and 'ssd' is
qnorm(.9, smn, ssd)
: So this seems to deliver (somewhat) sensible results, albeit not the full hack ofcentiles
that I suggested:(Note the massive overplotting from only a few distinct xvars but 500 points. Ande you may want to set the ylim so that the full range can be appreciated.)
The caveat here is that you need to check the other parts of the model to see if it is really just an ordinary Normal model. In this case it seems to be:
So the model is just mean-estimate with a fixed-variance (the
~1
) across the range of thexvar
, and there are no complications from higher order parameters like a Box-Cox model. (And I'm unable to explain why this is not the same as the plotted centiles. For that you probably need to correspond with the package authors.)