How to create a loop based on h2o package env. correctly

424 views Asked by At

I would like to create a data frame that will present the accuracy of different seeds number and deep learning methods.I have created the code that contains two loops (see below) but I got an error, How can I create this loop correctly

Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = page,  : 
ERROR MESSAGE:

Can only append one column

Attached is my code:

attach(iris)
train<-iris
test<-iris
invisible(capture.output(h2o.init(nthreads = -1))) # initalising with all cpu cores
trainHex <- as.h2o(train[1:200,])
testHex <- as.h2o(test)
x_names  <- colnames(trainHex[1:4])
SEED<-c(123456789,12345678,1234567)
method<-c("Rectifier", "Tanh", "TanhWithDropout", "RectifierWithDropout", "Maxout", "MaxoutWithDropout")
Res<-data.frame()

for(i in 1:6){
    for(j in 1:3){

        system.time(ann <- h2o.deeplearning(
            reproducible = TRUE,
            seed = SEED[j],
            x = x_names,
            y = "Species",
            training_frame = trainHex,epochs = 50,
            standardize = TRUE,
            nesterov_accelerated_gradient = T, # for speed
            activation = method[i] 
        ))
        #ann
        testHex$h20<-ifelse(predict(ann,newdata = testHex)>0.5,1,0)
        testHex<-as.data.frame(testHex)
        s<-xtabs(~Species +h20,data=testHex )
        accuracy<-sum(diag(s))/sum(s)
        tmp<-data.frame(seed=SEED[j],method=method[i],result=accuracy)
        Res<-rbind(Res,tmp)

    }
}
Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = page,  : 


ERROR MESSAGE:

Can only append one column
1

There are 1 answers

0
Darren Cook On BEST ANSWER

You are doing a multinomial classification; i.e. the prediction is going to be one of three classes. h2o.predict() is therefore returning 4 columns:

> predict(ann,newdata = testHex)
  |========================================================================================================================================================| 100%
  predict    setosa   versicolor    virginica
1  setosa 0.9999930 7.032604e-06 1.891484e-30
2  setosa 0.9998726 1.274161e-04 2.791200e-28
3  setosa 0.9999923 7.679687e-06 1.101218e-29
4  setosa 0.9999838 1.619749e-05 1.593254e-28
5  setosa 0.9999978 2.150244e-06 7.174795e-31
6  setosa 0.9999932 6.844831e-06 5.511857e-29

[150 rows x 4 columns]

I'm not completely sure what you are doing, but given this to get the predictions:

p = predict(ann,newdata = testHex)

You can do this to get a 1 for a correct answer, 0 for a mistake:

p$predict == testHex$Species

Or, doing it client-side:

p = as.data.frame( predict(ann,newdata = testHex) )
p$predict == iris$Species

More generally, h2o.grid() is better for experimenting with alternative parameters. I think this might be closer to your intention:

parts = h2o.splitFrame(as.h2o(iris), 0.8, seed=123)
trainHex = parts[[1]]
testHex = parts[[2]]

g = h2o.grid("deeplearning",
 hyper_params = list(
   seed = c(123456789,12345678,1234567),
   activation = c("Rectifier", "Tanh", "TanhWithDropout", "RectifierWithDropout", "Maxout", "MaxoutWithDropout")
   ),
 reproducible = TRUE,
 x = 1:4,
 y = 5,
 training_frame = trainHex,
 validation_frame = testHex,
 epochs = 1
 )
g  #Output the grid

(I've set epochs to 1 just to get it to finish quickly. Set to 50 if you want.)

I've used splitFrame() to use 80% as training data, 20% as test data; by assigning the test data to validation_frame the grid will score on that unseen data automatically for us.