Why do "segmented" and "selgmented" functions in the package "segmented" give different outputs?

70 views Asked by At

I am using the selgmented function in the segmented package to select the number of breakpoints in a time-series regression. This function behaves as expected and provides a segmented object. However, if I run the same dataset using the segmented function with npsi set to the number of breakpoints determined by selgmented, it sometimes returns different results (i.e., different segment slopes and different breakpoints). I have tried modifying some of the parameters using seg.control, but I am usually unsuccessful in achieving identical results between the two functions. I am wondering what is different between these functions and why they produce different results. Usually these differences are minor, but I'd like to understand why they occur.

I have included an example using the Plant organ dataset included with the segmented package.

library(segmented) 

data(plant)
plant_RKV <- plant[which(plant$group=="RKV"), ]

out.lm <- lm(y ~ time, data=plant_RKV)

os <- selgmented(out.lm, type="bic", Kmax=5)
o <- segmented(out.lm, npsi=4)

> summary(os)

        ***Regression Model with Segmented Relationship(s)***

Call: 
segmented.lm(obj = olm, seg.Z = seg.Z, psi = startpsi[[i - 1]], 
    control = control1)

Estimated Break-Point(s):
              Est. St.Err
psi1.time 314.495 13.909
psi2.time 471.338 15.939
psi3.time 542.613  9.392
psi4.time 565.911  7.278

Meaningful coefficients of the linear terms:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.0694477  0.0281773   2.465    0.022 *  
time         0.0024848  0.0001249  19.895 1.49e-15 ***
U1.time     -0.0015098  0.0002187  -6.903       NA    
U2.time     -0.0030369  0.0011887  -2.555       NA    
U3.time      0.0056947  0.0022973   2.479       NA    
U4.time     -0.0043988  0.0019883  -2.212       NA    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.02418 on 22 degrees of freedom
Multiple R-Squared: 0.9856,  Adjusted R-squared: 0.9798 

Convergence attained in 3 iterations (rel. change 2.3914e-10)

> summary(o)

        ***Regression Model with Segmented Relationship(s)***

Call: 
segmented.lm(obj = out.lm, npsi = 4)

Estimated Break-Point(s):
              Est. St.Err
psi1.time 295.723 16.654
psi2.time 458.156 11.044
psi3.time 544.410  8.446
psi4.time 566.120  7.659

Meaningful coefficients of the linear terms:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.0497251  0.0359206   1.384     0.18    
time         0.0025961  0.0001788  14.522  9.4e-13 ***
U1.time     -0.0014525  0.0002439  -5.956       NA    
U2.time     -0.0027260  0.0006178  -4.412       NA    
U3.time      0.0050112  0.0020535   2.440       NA    
U4.time     -0.0041930  0.0019796  -2.118       NA    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.02407 on 22 degrees of freedom
Multiple R-Squared: 0.9858,  Adjusted R-squared: 0.9799 

Boot restarting based on 6 samples. Last fit:
Convergence *not* attained in 1 iterations (rel. change 0.062135)

If I set number of bootstrap samples and the maximum number of iterations, then I can get the outputs to align.

out.lm <- lm(y ~ time, data=data_plant)

os <- selgmented(out.lm, type="bic", Kmax=5, control=seg.control(n.boot=1000, it.max = 5))
o <- segmented(out.lm, npsi=4, control=seg.control(n.boot=1000, it.max = 5))

> summary(os)

        ***Regression Model with Segmented Relationship(s)***

Call: 
segmented.lm(obj = olm, seg.Z = seg.Z, psi = startpsi[[i - 1]], 
    control = control1)

Estimated Break-Point(s):
              Est. St.Err
psi1.time 300.220 14.015
psi2.time 469.728 15.680
psi3.time 542.613  9.245
psi4.time 565.911  7.164

Meaningful coefficients of the linear terms:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.0519970  0.0307320   1.692    0.105    
time         0.0025830  0.0001437  17.971 1.23e-14 ***
U1.time     -0.0015229  0.0002118  -7.190       NA    
U2.time     -0.0031220  0.0011670  -2.675       NA    
U3.time      0.0056947  0.0022612   2.518       NA    
U4.time     -0.0043988  0.0019571  -2.248       NA    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.0238 on 22 degrees of freedom
Multiple R-Squared: 0.9861,  Adjusted R-squared: 0.9804 

Convergence *not* attained in 5 iterations (rel. change 0.00043136) 

> summary(o)

        ***Regression Model with Segmented Relationship(s)***

Call: 
segmented.lm(obj = out.lm, npsi = 4, control = seg.control(n.boot = 1000, 
    it.max = 5))

Estimated Break-Point(s):
              Est. St.Err
psi1.time 300.220 14.015
psi2.time 469.728 15.680
psi3.time 542.613  9.245
psi4.time 565.911  7.164

Meaningful coefficients of the linear terms:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.0519970  0.0307320   1.692    0.105    
time         0.0025830  0.0001437  17.971 1.23e-14 ***
U1.time     -0.0015229  0.0002118  -7.190       NA    
U2.time     -0.0031220  0.0011670  -2.675       NA    
U3.time      0.0056947  0.0022612   2.518       NA    
U4.time     -0.0043988  0.0019571  -2.248       NA    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.0238 on 22 degrees of freedom
Multiple R-Squared: 0.9861,  Adjusted R-squared: 0.9804 

Boot restarting based on 12 samples. Last fit:
Convergence attained in 4 iterations (rel. change 2.2993e-12)

However, changing the parameters with seg.control does not seem to work will all examples. Part of that may be due to my unfamiliarity with the specifics of these parameters.

Here is a second example from my own dataset (time series) where the breakpoints are noticeably different between the two functions. I am unable to modify the parameters to get the outputs to align.

x <- c(0, 43, 156, 238, 254, 323, 674, 870, 1193, 1555, 1926, 2279, 2660, 3103, 3479, 3832, 4214, 4929, 5665, 6401, 7122, 7850, 9314, 10062, 11151, 11869)
y <- c(7.31322039, 7.32974969, 6.85224257, 7.11882625, 6.56526497, 6.13122649, 6.30991828, 5.88610403, 4.78749174, 3.76120012, 4.02535169, 4.09434456, 2.23216263, 1.97685495, 2.86789890, 2.80336038, 1.28093385, 0.53062825, 0.83290912, 1.02961942, 0.64185389, 0.18232156, 0.09531018, 0.09531018, 0.33647224, 0.51282363)

out.lm2 <- lm(y ~ x)

os2 <- selgmented(out.lm2, type="bic", Kmax=5)
o2 <- segmented(out.lm2, npsi=2)

par(mfrow=c(1,2))
plot.segmented(os2)
plot.segmented(os2, res=TRUE, add=TRUE, conf.level=0.95, shade=TRUE)
plot.segmented(o2)
plot.segmented(o2, res=TRUE, add=TRUE, conf.level=0.95, shade=TRUE) 

comparison of segmented and selgmented piecewise regressions

I appreciate any feedback on why the outputs of the segmented and selgmented functions differ and how to modify that if possible.

0

There are 0 answers