I have a data set that I am exploring using multiple regression in R. My model is as follows:
model<-lm(Trait~Noise+PC1+PC2)
where Noise, PC1, and PC2 are continuous covariates that predict a particular Trait that is also continuous.
The summary(model) call shows that both Noise and PC1 significantly affect changes in Trait, just in opposite ways. Trait increases as 'Noise' increases, but decreases as PC1 increases.
To tease apart this relationship, I want to create simulated data sets based on the sample size (45) of my original data set and by manipulating Noise and PC1 within the parameters seen in my data set, so: high levels of both, low levels of both, high of one and low of the other, etc...
Can someone offer up some advice on how to do this? I am not overly familiar with R, so I apologize if this question is overly simple.
Thank you for your time.
It's a bit unclear what you're looking for (this should probably be on Cross Validated), but here's a start and an approximate description of linear regression.
Let's say I have some datapoints that are 3 dimensional (
Noise,PC1,PC2), and you say there's 45 of them.These data are randomly distributed around this 3 dimensional space. Now we imagine there's another variable that we're particularly interested in called
Trait. We think that the variations in each ofNoise,PC1, andPC2can explain some of the variation observed inTrait. In particular, we think that each of those variables is linearly proportional toTrait, so it's just the basic oldy=mx+blinear relationship you've seen before, but there's a different slopemfor each of the variables. So in total we imagineTrait = m1*Noise + m2*PC1 + m3*PC2 +bplus some added noise (it's a shame one of your variables is namedNoise, that's confusing).So going back to simulating some data, we'll just pick some values for these slopes and put them in a vector called
beta.So the model
Trait = m1 Noise + m2 PC1 + m3 PC2 +bmight also be expressed with simple matrix multiplication, and we can do it in R with,where we've added Gaussian noise of standard deviation equal to 1.
So this is the 'simulated data' underlying a linear regression model. Just as a sanity check, let's try
So notice that the slope we picked for
PC2was so small (0.1) relative to the overall variability in the data, that it isn't detected as a statistically significant predictor. And the other two variables have opposite effects onTrait. So in simulating data, you might adjust the observed ranges of the variables, as well at the magnitudes of the regression coefficientsbeta.