Constrained GAM in r: increasing

94 views Asked by At

I am looking to enforce a generally increasing GAM which doesn't have to be monotonic, however, it should be increasing and A should not reach 0 once the B variable in my example exceeds 15,000. I am including a sample df for reproduction as well as my current progress. It is apparent that the value of A at the greatest B point is turning downward (if plotted) for one of the two unique XY pairings, however, I don't want an A value of 0 when B exceeds 15,000. It should be noted that it is ok for A to equal 0 when B is below 0 (B can be negative at A=0). This is just an example of two pairings, and there are several more so I need a general function which makes it difficult to set unique weights for each.

Step 1: load packages and read in df
library(mgcv)
library(dplyr)

df <- structure(list(X = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), Y = c(1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2), grp = c("0", "1", "2", "3", "4", "5", "6", 
"7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", 
"18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", 
"29", "30", "31", "32", "33", "34", "35", "0", "1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", 
"16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", 
"27", "28", "29", "30", "31", "32", "33", "34", "35"), A = c(1.13659381866455, 
1.13659477233887, 1.13658058643341, 1.13659477233887, 1.13656425476074, 
1.13656532764435, 1.13658249378204, 1.13656723499298, 0.709016442298889, 
0.708269774913788, 0.707276940345764, 0.705555200576782, 0.703566908836365, 
0.705580234527588, 0.711442351341248, 0.709806680679321, 0.706768035888672, 
0.710436880588531, 0.711900770664215, 0.627014517784119, 0.624246716499329, 
0.637109100818634, 0.6302330493927, 0.712375164031982, 0.632026135921478, 
0.707980036735535, 1.02401030063629, 1.24682903289795, 1.16107773780823, 
1.72039294242859, 1.65212118625641, 1.77960109710693, 2.14993286132812, 
2.56561064720154, 2.75121307373047, 3.24887132644653, 0, 0.471771329641342, 
0.711109101772308, 0.79891711473465, 0.931526362895966, 0.968687295913696, 
0.995432734489441, 1.0633533000946, 1.16641199588776, 1.34169447422028, 
1.49520266056061, 1.63908088207245, 1.7665741443634, 1.88498985767365, 
1.99606990814209, 2.09789752960205, 2.19571089744568, 2.28780961036682, 
2.37474989891052, 2.44801378250122, 2.52833318710327, 2.60511207580566, 
2.67927670478821, 2.7647979259491, 2.82285761833191, 2.97204279899597, 
3.10502099990845, 3.23239874839783, 3.33626580238342, 3.47603249549866, 
3.57603311538696, 3.68949151039124, 3.95258760452271, 3.93370676040649, 
4.04461479187012, 4.92624521255493), B = c(300, 400, 530, 600, 
730, 770, 800, 880, 1000, 1250, 1500, 1750, 2000, 2250, 2500, 
2750, 3000, 3250, 3500, 3750, 4000, 4250, 4500, 4750, 5000, 5500, 
6000, 6500, 7000, 7500, 8000, 8500, 9000, 9500, 10000, 15000, 
300, 400, 530, 600, 730, 770, 800, 880, 1000, 1250, 1500, 1750, 
2000, 2250, 2500, 2750, 3000, 3250, 3500, 3750, 4000, 4250, 4500, 
4750, 5000, 5500, 6000, 6500, 7000, 7500, 8000, 8500, 9000, 9500, 
10000, 15000)), row.names = c(NA, -72L), class = c("tbl_df", 
"tbl", "data.frame"))
step 2: perform a GAM for each unique X,Y pairing (2 here) over all 35 groups contained within.
new_dat <- df %>% 
  summarise(gam_model=list(gam(B~s(A))), .by = c(X,Y))
step 3: perform row-wise GAM prediction when A ='s 0
pred_df <- data.frame(A= 0)
gam_predict <- function(m) predict(m,pred_df)
new_dat$gam_pred_0 <- sapply(new_dat$gam_model,gam_predict)
1

There are 1 answers

1
user11057680 On

shape constrained additive models (scam) may be utilized to account for monotonicity, convexity or a combination of the two. With respect to this application, a shape constrained smoothing term constructed using penalized splines (specifically Monotone increasing P-splines) was employed. With step 1 remaining the same, the following two steps would be augmented to the following.

library(scam)
new_dat <- df %>% 
  summarise(scam_model=list(scam(B~s(A,bs="mpi"))), .by = c(X,Y))

pred_df <- data.frame(A= 0)
scam_predict <- function(m) predict(m,pred_df)
new_dat$scam_pred_0 <- sapply(new_dat$scam_model,scam_predict)

it is worth noting - the run time of this over several obs. has been alarmingly slow in opposition to a gam call. 1 hr vs still running. If someone knows of a way to optimize or alternative means let me know.