Extracting r.squared values from an lmList object

355 views Asked by At

I'm sorry if this is a dumb question but I've been trying to extract the r squared values from an lmList model object for a few days no with no success. I'm running a custom bootstrap function to allow me to cross validate my results and I'm able to successfully pull out coefficients and RMSE.

GroupedBootStrampling <- function(slice, root, data, iters, met) {

  print('Initializing Modeling')
  root <- as.vector(root[slice, ])

  coefs <- list()
  RMSE <- list()
  r_sq <- list()
  mods <- list()

  for (i in 1:iters){

    print('Splitting Data into training and testing sets')
    comb <- splitter(data,0.8)
    train <- comb[[1]]
    test <- comb[[2]]
    print('Data sucessfully split')

    # Create var functional form to pass to lmList model object
    train$var <- train$cpc**(1/root[, 1])*train$cost**(1/root[, 2])
    test$var <- test$cpc**(1/root[, 1])*test$cost**(1/root[, 2])
    train[,'temp'] <- train[,met]

    # Use lmList to train and test the models
    print('Creating Models for each campaign')
    model <- lmList(temp ~ -1 + var + I(var*Mon) + I(var*Tues) + I(var*Wed) +             I(var*Thurs) + I(var*Fri) + I(var*Sat) | grouper , data = train)
    #model <- as.formula("temp ~ -1 + var + I(var*Mon) + I(var*Tues) + I(var*Wed) + I(var*Thurs) + I(var*Fri) + I(var*Sat)")
    #model <- ridgeList(train, model, "grouper")
    mods[[i]] <- model

    # Predict clicks based on the Test dataset
    print('Generating predictions')
    preds <- predict(model, test, se.fit = T)

    preds$RMSE <- as.numeric((test[,met] - preds$fit)**2)

    # Predict RMSE off by subtracting the actual clicks from the predicted clicks for the test dataset
    RMSE[[i]] <- as.data.frame(preds %>% group_by(grouper) %>% summarize(RMSE = (mean(RMSE)**0.5)))

    # extract the coefficients from the model
    coefs[[i]] <- data.frame(groups = unique(train$grouper), coef(model, augFrame = T, data = train, which = 'grouper'))

    # extract the r.squared
    r_sq[[i]] <- summary(model)$r.squared

  }

  # Pick out average RMSE and Average coefs
  avg_RMSE <- do.call(rbind,RMSE)
  avg_coefs <- do.call(rbind,coefs)
  avg_r.sq <- do.call(rbind,r_sq)
  avg_coefs$RMSE <- avg_RMSE$RMSE
  avg_coefs$r_sq <- avg_r.sq


  # Create a dataframe of deisred output to be returned from the function
  print('Collecting Results')
  outs <- avg_coefs %>% group_by(groups) %>% summarize(var_coef = mean(var),
                                                         int_mon_coef = mean(I.var...Mon.),
                                                         int_tues_coef = mean(I.var...Tues.),
                                                         int_wed_coef = mean(I.var...Wed.),
                                                         int_thurs_coef = mean(I.var...Thurs.),
                                                         int_fri_coef = mean(I.var...Fri.),
                                                         int_sat_coef = mean(I.var...Sat.),
                                                         RMSE = mean(RMSE),
                                                         r_sq = mean(r_sq))

  outs$cpc_root <- root[,1]
  outs$cost_root <- root[,2]


  outs <- na_killer(outs)
  #   outs <- pred_optimizer2(outs, data)

  print('Loop Complete')
  return(outs)
}

However when I attempt to run this function which has worked in the past, prior to me adding in the r-squared calculation, I receive the following error:

Error in `[<-`(`*tmp*`, use, use, ii, value = lst[[ii]]) : 
subscript out of bounds 

Which when traced back points to the r.squared calculation. I've tried various combinations for the subscript and I still can't seem to get it. If I generate an lmList outside of the iteration loop I can use summary(model)$r.squared and it works without any issues.

If anyone has any ideas I'd really appreciate it!

0

There are 0 answers