I've noticed many R models allow a "weights" parameter (e.g. cart, loess, gam,...). Most of the help functions describe it as "prior weights" for the data, but what does that actually mean?
I have data with many repeated cases and a binary response. I was hoping I could use "weights" to encode how many times each combination of input and response occurs, but this doesn't seem to work. I've also tried making the response the proportion of successes, and the weight the total trials for each combination of covariates, but this doesn't seem to work either (at least for gam). I'm trying to do this for all of the model types listed above, but for starters, how to do this for gam [mgcv package]?
I also used to think the weights were a convenient way of encoding sample sizes for repeated observations. But the following example shows that this is not the case for a simple linear model. I first define a contingency table with observed/invented shoe sizes and heights of people and fit a leats squares regression specifying the frequencies as the weights:
Notice that the coefficient for the slope is non significant and the residual error is based on "10 degrees of freedom"
This changes when I convert the contingency table into the "raw" data, meaning one row per observation, with the convenience function expand.dft:
We obtain the identical coefficient but this time highly significant as based on "163 degrees of freedom"