rgp (R genetic programming) package - not able to do regression

369 views Asked by At

I am trying to do non-linear regression using R genetic package (rgp) using technique used here: Fitting a curve to specific data (see second method). I am using R package drc for heartrate data:

library(drc)

head(heartrate)
#  pressure   rate
#1    50.85 348.76
#2    54.92 344.45
#3    59.23 343.05
#4    61.91 332.92
#5    65.22 315.31
#6    67.79 313.50

library(rgp)

res <- symbolicRegression(rate ~ pressure, data=heartrate)

(symbreg <- res$population[[which.min(sapply(res$population, res$fitnessFunction))]])
#function (pressure) 
#pressure + (pressure/0.853106872646055 + pressure)

ggplot() + 
    geom_point(data=heartrate, aes(pressure,rate), size = 3) +
    geom_line(data=data.frame(symbx=heartrate$pressure, 
                              symby=sapply(heartrate$pressure, symbreg)), 
              aes(symbx, symby), colour = "red")

However, the resulting regression line that I am getting is clearly incorrect. The distribution of data points indicate a curvilinear relation with rate reducing as pressure increases (inverse association). However, the regression line generated is linear and in the wrong direction.

enter image description here

Where is the error?

Edit:

Using increased steps as suggested by @cuttlefish44 in comments:

res = symbolicRegression(rate ~ pressure, data = heartrate, stopCondition = makeStepsStopCondition(45000))

(symbreg <- res$population[[which.min(sapply(res$population, res$fitnessFunction))]])
#function (pressure) 
#exp(exp(exp(cos(cos(-9.23878724686801/pressure)))))

It took 8 minutes to complete. The plot is:

enter image description here

The direction of regression line is better than above (!), but it indicates that it will take a really long time to reach the obvious direction. The regression line with the function obtained by @cuttlefish44 is also similar and not a really good fit.

1

There are 1 answers

4
bobolafrite On

You may have already read this but I think your answer is hidden somewhere inside this introduction to RGP package written by Oliver Flasch.

I don't know anything about rgppackage but if you only want a linear regression, why don't you use lm()function from the base package ?

At least you would be able to estimate parameters of β0 and β1 for Ordinary least squares regression :

rate = β1*pressure + β0

     linear.model <- lm(rate ~ pressure, data=heartrate)

     ggplot(data=heartrate, aes(x=pressure,y=rate)) + 
         geom_point() + 
         geom_smooth(method="lm", col="red")

linear regression with ggplot2

You can access the coeficcients with linear.model$coefficients

You can still manipulate the predicted values with linear.model$fitted.values

You have access to the residual with : linear.model$residuals

If you want to fit the curve with more accuracy the linear model might be not sufficient, you can try glm, or a polynomial regression and select the best model with AIC or BIC criteria.