I am trying to do non-linear regression using R genetic package (rgp) using technique used here: Fitting a curve to specific data (see second method). I am using R package drc
for heartrate
data:
library(drc)
head(heartrate)
# pressure rate
#1 50.85 348.76
#2 54.92 344.45
#3 59.23 343.05
#4 61.91 332.92
#5 65.22 315.31
#6 67.79 313.50
library(rgp)
res <- symbolicRegression(rate ~ pressure, data=heartrate)
(symbreg <- res$population[[which.min(sapply(res$population, res$fitnessFunction))]])
#function (pressure)
#pressure + (pressure/0.853106872646055 + pressure)
ggplot() +
geom_point(data=heartrate, aes(pressure,rate), size = 3) +
geom_line(data=data.frame(symbx=heartrate$pressure,
symby=sapply(heartrate$pressure, symbreg)),
aes(symbx, symby), colour = "red")
However, the resulting regression line that I am getting is clearly incorrect. The distribution of data points indicate a curvilinear relation with rate reducing as pressure increases (inverse association). However, the regression line generated is linear and in the wrong direction.
Where is the error?
Edit:
Using increased steps as suggested by @cuttlefish44 in comments:
res = symbolicRegression(rate ~ pressure, data = heartrate, stopCondition = makeStepsStopCondition(45000))
(symbreg <- res$population[[which.min(sapply(res$population, res$fitnessFunction))]])
#function (pressure)
#exp(exp(exp(cos(cos(-9.23878724686801/pressure)))))
It took 8 minutes to complete. The plot is:
The direction of regression line is better than above (!), but it indicates that it will take a really long time to reach the obvious direction. The regression line with the function obtained by @cuttlefish44 is also similar and not a really good fit.
You may have already read this but I think your answer is hidden somewhere inside this introduction to RGP package written by Oliver Flasch.
I don't know anything about
rgp
package but if you only want a linear regression, why don't you uselm()
function from the base package ?At least you would be able to estimate parameters of β0 and β1 for Ordinary least squares regression :
rate = β1*pressure + β0
linear regression with ggplot2
You can access the coeficcients with
linear.model$coefficients
You can still manipulate the predicted values with
linear.model$fitted.values
You have access to the residual with :
linear.model$residuals
If you want to fit the curve with more accuracy the linear model might be not sufficient, you can try
glm
, or a polynomial regression and select the best model with AIC or BIC criteria.