I am using the selgmented function in the segmented package to select the number of breakpoints in a time-series regression. This function behaves as expected and provides a segmented object. However, if I run the same dataset using the segmented function with npsi set to the number of breakpoints determined by selgmented, it sometimes returns different results (i.e., different segment slopes and different breakpoints). I have tried modifying some of the parameters using seg.control, but I am usually unsuccessful in achieving identical results between the two functions. I am wondering what is different between these functions and why they produce different results. Usually these differences are minor, but I'd like to understand why they occur.
I have included an example using the Plant organ dataset included with the segmented package.
library(segmented)
data(plant)
plant_RKV <- plant[which(plant$group=="RKV"), ]
out.lm <- lm(y ~ time, data=plant_RKV)
os <- selgmented(out.lm, type="bic", Kmax=5)
o <- segmented(out.lm, npsi=4)
> summary(os)
***Regression Model with Segmented Relationship(s)***
Call:
segmented.lm(obj = olm, seg.Z = seg.Z, psi = startpsi[[i - 1]],
control = control1)
Estimated Break-Point(s):
Est. St.Err
psi1.time 314.495 13.909
psi2.time 471.338 15.939
psi3.time 542.613 9.392
psi4.time 565.911 7.278
Meaningful coefficients of the linear terms:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0694477 0.0281773 2.465 0.022 *
time 0.0024848 0.0001249 19.895 1.49e-15 ***
U1.time -0.0015098 0.0002187 -6.903 NA
U2.time -0.0030369 0.0011887 -2.555 NA
U3.time 0.0056947 0.0022973 2.479 NA
U4.time -0.0043988 0.0019883 -2.212 NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.02418 on 22 degrees of freedom
Multiple R-Squared: 0.9856, Adjusted R-squared: 0.9798
Convergence attained in 3 iterations (rel. change 2.3914e-10)
> summary(o)
***Regression Model with Segmented Relationship(s)***
Call:
segmented.lm(obj = out.lm, npsi = 4)
Estimated Break-Point(s):
Est. St.Err
psi1.time 295.723 16.654
psi2.time 458.156 11.044
psi3.time 544.410 8.446
psi4.time 566.120 7.659
Meaningful coefficients of the linear terms:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0497251 0.0359206 1.384 0.18
time 0.0025961 0.0001788 14.522 9.4e-13 ***
U1.time -0.0014525 0.0002439 -5.956 NA
U2.time -0.0027260 0.0006178 -4.412 NA
U3.time 0.0050112 0.0020535 2.440 NA
U4.time -0.0041930 0.0019796 -2.118 NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.02407 on 22 degrees of freedom
Multiple R-Squared: 0.9858, Adjusted R-squared: 0.9799
Boot restarting based on 6 samples. Last fit:
Convergence *not* attained in 1 iterations (rel. change 0.062135)
If I set number of bootstrap samples and the maximum number of iterations, then I can get the outputs to align.
out.lm <- lm(y ~ time, data=data_plant)
os <- selgmented(out.lm, type="bic", Kmax=5, control=seg.control(n.boot=1000, it.max = 5))
o <- segmented(out.lm, npsi=4, control=seg.control(n.boot=1000, it.max = 5))
> summary(os)
***Regression Model with Segmented Relationship(s)***
Call:
segmented.lm(obj = olm, seg.Z = seg.Z, psi = startpsi[[i - 1]],
control = control1)
Estimated Break-Point(s):
Est. St.Err
psi1.time 300.220 14.015
psi2.time 469.728 15.680
psi3.time 542.613 9.245
psi4.time 565.911 7.164
Meaningful coefficients of the linear terms:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0519970 0.0307320 1.692 0.105
time 0.0025830 0.0001437 17.971 1.23e-14 ***
U1.time -0.0015229 0.0002118 -7.190 NA
U2.time -0.0031220 0.0011670 -2.675 NA
U3.time 0.0056947 0.0022612 2.518 NA
U4.time -0.0043988 0.0019571 -2.248 NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.0238 on 22 degrees of freedom
Multiple R-Squared: 0.9861, Adjusted R-squared: 0.9804
Convergence *not* attained in 5 iterations (rel. change 0.00043136)
> summary(o)
***Regression Model with Segmented Relationship(s)***
Call:
segmented.lm(obj = out.lm, npsi = 4, control = seg.control(n.boot = 1000,
it.max = 5))
Estimated Break-Point(s):
Est. St.Err
psi1.time 300.220 14.015
psi2.time 469.728 15.680
psi3.time 542.613 9.245
psi4.time 565.911 7.164
Meaningful coefficients of the linear terms:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0519970 0.0307320 1.692 0.105
time 0.0025830 0.0001437 17.971 1.23e-14 ***
U1.time -0.0015229 0.0002118 -7.190 NA
U2.time -0.0031220 0.0011670 -2.675 NA
U3.time 0.0056947 0.0022612 2.518 NA
U4.time -0.0043988 0.0019571 -2.248 NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.0238 on 22 degrees of freedom
Multiple R-Squared: 0.9861, Adjusted R-squared: 0.9804
Boot restarting based on 12 samples. Last fit:
Convergence attained in 4 iterations (rel. change 2.2993e-12)
However, changing the parameters with seg.control does not seem to work will all examples. Part of that may be due to my unfamiliarity with the specifics of these parameters.
Here is a second example from my own dataset (time series) where the breakpoints are noticeably different between the two functions. I am unable to modify the parameters to get the outputs to align.
x <- c(0, 43, 156, 238, 254, 323, 674, 870, 1193, 1555, 1926, 2279, 2660, 3103, 3479, 3832, 4214, 4929, 5665, 6401, 7122, 7850, 9314, 10062, 11151, 11869)
y <- c(7.31322039, 7.32974969, 6.85224257, 7.11882625, 6.56526497, 6.13122649, 6.30991828, 5.88610403, 4.78749174, 3.76120012, 4.02535169, 4.09434456, 2.23216263, 1.97685495, 2.86789890, 2.80336038, 1.28093385, 0.53062825, 0.83290912, 1.02961942, 0.64185389, 0.18232156, 0.09531018, 0.09531018, 0.33647224, 0.51282363)
out.lm2 <- lm(y ~ x)
os2 <- selgmented(out.lm2, type="bic", Kmax=5)
o2 <- segmented(out.lm2, npsi=2)
par(mfrow=c(1,2))
plot.segmented(os2)
plot.segmented(os2, res=TRUE, add=TRUE, conf.level=0.95, shade=TRUE)
plot.segmented(o2)
plot.segmented(o2, res=TRUE, add=TRUE, conf.level=0.95, shade=TRUE)
comparison of segmented and selgmented piecewise regressions
I appreciate any feedback on why the outputs of the segmented and selgmented functions differ and how to modify that if possible.