I would like to run a loop that will run per each category of one of the variables and produce a prediction per each regression so that the sum of the prediction variable will be deduced from the target variable .Here Is my toy data and code:
df <- read.table(text = "target birds wolfs snakes
3 9 7 a
3 8 4 b
1 2 8 c
1 2 3 a
1 8 3 a
6 1 2 a
6 7 1 b
6 1 5 c
5 9 7 c
3 8 7 c
4 2 7 b
1 2 3 b
7 6 3 c
6 1 1 a
6 3 9 a
6 1 1 b ",header = TRUE)
I wrote this code(below) that it's aim was to get the results of the calculation written above but I got an error while :
Here is the code:
b <- list()
for(i in c("a","b",'c')){
lmModel <- lm(target ~ birds+wolfs, data = subset(df, snakes == i) )
b[i] <- sum(predict(lmModel,newdata=subset(df, snakes == i))) - sum(df$target[which(df$snakes=='a'),])
}
b <- as.numeric(b)
b
I got this error:
Error in df$target[which(df$snakes == "a"), ] : incorrect number of dimensions
How can I solve this issue?
The problem arises from you mixture of subsetting types here:
df$target[which(df$snakes=='a'),]
Once you use
$
the output is no longer a data.frame, and the two parameter[
subsetting is no longer valid. You are better off compacting it to:As for your model, you can just create one with
snakes
as a covariate, and use the predictions from that to sum in the snakes groups:And to get the final output of your
b
variable,but note that there is a small numerical discrepancy for the a value.
Alternatively, and to check, you can specify subsets of data via an argument to
lm
: