I have a fitted model using the segmented package. This consists of 1 segmented variable and some categorical variables. The problem is that when I use the model for prediction, it only predicts using the segmented variable.
The problem can be recreated with this code:
n=10
x=rep(seq(-3,3,l=n), 2)
z=c(rep(1, 10), rep(0, 10))
set.seed(1515)
y <- (x<0)*x/2 + 1 + 0.5*z + rnorm(x,sd=0.15)
segm <- segmented(lm(y ~ x + as.factor(z)), ~ x, psi=0.5)
newdf <- data.frame(expand.grid(x=seq(-3, 3, 0.5), z=c("1", "0")))
newdf$p1 <- predict(segm, newdata = newdf)
plot(newdf$x, newdf$p1)
You can see that the predict function returns the exact same value irrespective of the z variable value.
I would have expected the effect of the z variable to be included.
I have tried extracting the components of the prediction with type="terms" which is what is in the package documentation but this doesn't seem to work either:
Error in match.arg(type) : 'arg' should be one of “link”, “response”
In the version I have, (I know some in the comments couldn't replicate the problem) the problem appears to be when the function identifies the variables and corresponding coefficients to include in the prediction (you can follow along using
debugonce(predict.segmented)until line 150-151:The column names of
X.noVare the variable names innewdataand the names ofestcoef.noVare the coefficient names. In the data, the variable isz, but the corresponding coefficient name isz1, so this doesn't work. Here are a couple of examples to show how this operates.Here's a way you could make it work. First, we'll build your data:
Now, use the same
segmented()call, but save the linear model that is used as input tosegmented()as an object - you can do this inline.Generate your new data as before, making
zthe appropriate factor.Here is where the difference comes - you also need to make the model matrix for the input linear model using the new data frame as the data. Note, you have to put values in the new data frame for the dependent variable, but they can be anything as they are not used in the construction of the model matrix. That's why I used
y=0above.Now, attach to
newerall the columns fromXthat aren't already there. You need to do this because the function checks that all variables in the formula are there (e.g.,zhas to be in the data), but later on when the calculations are done,z1will also have to be there.Now, the predictions will be different.
Created on 2024-01-18 with reprex v2.0.2