I have some data that I'm modeling using restricted cubic splines. I'm using the rcs
transformation function in the rms
package to generate the transformed variables for a linear model. Here is an example using 5 knots.
library('rms')
my_df <- data.frame(
y = -4 * -100:100 + -1.5 * (-100:100)**2 + 3 * (-100:100)**3 + rnorm(201, 0, 1e5),
x = -100:100
)
mod <- lm(y ~ rcs(x, 5), data = my_df)
After I fit the data, I'd like to find the predicted y
values for a specific domain of x
values. Here is what I'm doing now:
new_data <- data.frame(x = -3:3)
predict(mod, newdata = new_data)
However, this generates a warning message:
Warning message:
In rcspline.eval(x, nk = nknots, inclx = TRUE, pc = pc, fractied = fractied) :
5 knots requested with 7 unique values of x. knots set to 5 interior values.
What does this mean, and what is going on? I expected that the knot locations should already be defined in mod
, so I don't understand why it seems to be trying to find new knots for the seven x
values that I give it. I can avoid the warning message by providing more x
values in new_data
, and just ignoring the ones I don't need, but I am concerned about what predict
is actually doing.
According to Hadley's comment on this question you shouldn't expect
lm
to work withrcs
. A quick demonstration why there's a problem:The predictions vary depending on the number of x-values, even for the same range, so definitely not a good idea to combine these functions.