I have a data set that I am exploring using multiple regression in R. My model is as follows:
model<-lm(Trait~Noise+PC1+PC2)
where Noise
, PC1
, and PC2
are continuous covariates that predict a particular Trait
that is also continuous.
The summary(model) call shows that both Noise
and PC1
significantly affect changes in Trait
, just in opposite ways. Trait
increases as 'Noise' increases, but decreases as PC1
increases.
To tease apart this relationship, I want to create simulated data sets based on the sample size (45) of my original data set and by manipulating Noise
and PC1
within the parameters seen in my data set, so: high levels of both, low levels of both, high of one and low of the other, etc...
Can someone offer up some advice on how to do this? I am not overly familiar with R, so I apologize if this question is overly simple.
Thank you for your time.
It's a bit unclear what you're looking for (this should probably be on Cross Validated), but here's a start and an approximate description of linear regression.
Let's say I have some datapoints that are 3 dimensional (
Noise
,PC1
,PC2
), and you say there's 45 of them.These data are randomly distributed around this 3 dimensional space. Now we imagine there's another variable that we're particularly interested in called
Trait
. We think that the variations in each ofNoise
,PC1
, andPC2
can explain some of the variation observed inTrait
. In particular, we think that each of those variables is linearly proportional toTrait
, so it's just the basic oldy=mx+b
linear relationship you've seen before, but there's a different slopem
for each of the variables. So in total we imagineTrait = m1*Noise + m2*PC1 + m3*PC2 +b
plus some added noise (it's a shame one of your variables is namedNoise
, that's confusing).So going back to simulating some data, we'll just pick some values for these slopes and put them in a vector called
beta
.So the model
Trait = m1 Noise + m2 PC1 + m3 PC2 +b
might also be expressed with simple matrix multiplication, and we can do it in R with,where we've added Gaussian noise of standard deviation equal to 1.
So this is the 'simulated data' underlying a linear regression model. Just as a sanity check, let's try
So notice that the slope we picked for
PC2
was so small (0.1
) relative to the overall variability in the data, that it isn't detected as a statistically significant predictor. And the other two variables have opposite effects onTrait
. So in simulating data, you might adjust the observed ranges of the variables, as well at the magnitudes of the regression coefficientsbeta
.