PLS in R: Model training and predicting values with two Y variables

1.4k views Asked by At

I' ve like to model training and predicting values using PLS model for more than one Y variables, but I have some problems when I try this approach, in my code below:

#First simulate some data
set.seed(123)
bands=20
data <- data.frame(matrix(runif(60*bands),ncol=bands))
colnames(data) <- paste0(1:bands)
data$nitrogen <- rpois(60,10)
data$carbon <- rpois(60,10)
#

#Tranning data set
cal_BD<-data[1:50,]

#Validation data set
val_BD<-data[51:60,]

# define explanatory variables (x)
spectra <- cal_BD[,1:20]

#Build PLS model using training data only
mod_pls <- plsr(carbon + nitrogen ~ spectra,
ncomp = 20, data =cal_BD, validation = "LOO", jackknife = TRUE)
summary(mod_pls)
#

#Prediction in validation data set
est_pls<-predict(mod_pls, comps = 20, newdata = val_BD)
est_pls
#

1) Doesn't work when I try carbon + nitrogen in model; and

2) I've like to create a new data frame with estimate values for carbon and nitrogen, using the code below:

val_BD2<-val_BD[,-(21:22)] # remove carbon + nitrogen beccause my goal is predict this values
est_pls<-predict(mod_pls, comps = 20, newdata = val_BD)#Prediction in validation data set (only X's)
final_est_DF<-cbind(val_BD2est_pls[,1],est_pls[,2])

And my desirable output with estimated carbon and nitrogen and not observed values is:

            1          2         3  ... carbon  nitrogen
51 0.04583117 0.93529980 0.6299731  ... 15.3     8.6
52 0.44220007 0.30122890 0.1838285  ... 10.0     7.1
53 0.79892485 0.06072057 0.8636441  ...  9.0     7.3
54 0.12189926 0.94772694 0.7465680  ... 11.1     6.5
55 0.56094798 0.72059627 0.6682846  ... 10.3     8.4
56 0.20653139 0.14229430 0.6180179  ... 13.9     9.1
...

This is possible?

1

There are 1 answers

0
Sergey Kucheryavskiy On BEST ANSWER

You can either use two separate PLS-models make predictions and combine results into single data frame manually or make one (PLS2) model for both predictors. The second makes sense only if the response variables are correlated. Seems like there is no straightforward option for PLS2 regression in plsr package. You can try:

  1. Call simpls.fit method directly (although authors do not recommend this). See for example: https://www.rdocumentation.org/packages/pls/versions/2.7-0/topics/simpls.fit. In this case you can specify Y as a matrix or data frame with two columns.

  2. Use other package with PLS2, e.g. https://www.rdocumentation.org/packages/plspm/versions/0.2-2/topics/plsreg2