Formatting dataframes for statistical analyses

68 views Asked by At

What I would like to do is to test the statistical relationship between one response and one explanatory variable. To do this, I assumed a one-way ANOVA was an effective procedure. However, my dataframe is not set up to do this. I have one column for a response variable (df1) but several columns that would be categorised into the explanatory variable I want (df2 and df3) below. As a crude example, df2 and df3 represent a season (summer) in 2 seperate locations. How would I test the influence of summer on the response variable in this instance?

df1 <- as.data.frame(matrix(sample(0:1000, 36*10, replace=TRUE), ncol=1))
df2 <- as.data.frame(matrix(sample(0:500, 36*10, replace=TRUE), ncol=1))
df3 <- as.data.frame(matrix(sample(0:200, 36*10, replace=TRUE), ncol=1))
Example <- cbind(df1,df2,df3)

Would this involve restructuring the dataframe so that df2 and df3 merge to become one long column and double the length of df1?

Thank you in advance for any help!

1

There are 1 answers

0
James White On BEST ANSWER

As suggested by Jaap and Andrew Taylor, the problem was formatting a linear regression. This was achieved through the 'stack' and 'cbind' functions.

df1 <- as.data.frame(matrix(sample(0:1000, 36*10, replace=TRUE), ncol=1))
df2 <- as.data.frame(matrix(sample(0:500, 36*10, replace=TRUE), ncol=1))
df3 <- as.data.frame(matrix(sample(0:200, 36*10, replace=TRUE), ncol=1))
Example <- cbind(df2,df3)
Stacked <- stack(Example)
Combined <- cbind(df1,Stacked)
colnames(Combined) <- c("Response","Explanatory","Variable")
Linear <- lm(Explanatory~Response, data = Combined)
summary(Linear)

Stack put all the explanatory variables (df2 and df3) into one column, whilst cbind combined this new column with the values from response (df1), with these values being replicated to create a dataframe with an even number of rows, as per SabDeM's comment.