Load and run h2o MOJO model without H2O cluster

163 views Asked by At

I have several models generated in h2o on Python. I saved them as mojo files (.zip and h2o-genmodel.jar) using the code:

H2O_Model.download_mojo(path=Model_Path, get_genmodel_jar=True)

Inside the zip files, there are 2 folders ("domains" and "experimental") and a model.ini file, which has info regarding the models.

I need to run those models in an old raspberry pi 2 to make predictions on some data provided inside CSV files. I can do it by running the models inside the h2o cluster using the following code:

#load H2O Model
h2o.init()

# Download the MOJO
mojo_path = model_mojo
imported_model = h2o.import_mojo(mojo_path)

# Run Model
new_observations = h2o.H2OFrame(df_prever)
predictions = imported_model.predict(new_observations)

# Save Results
predictions = predictions.as_data_frame(predictions)
data_results = pd.concat([df_prever, predictions], axis=1)
data_results = data_results.rename(columns={"predict" : "Predicted_Values"}) 
data_results = data_results.infer_objects()
data_results.to_csv("./Data_Results.csv", sep=delimiter, decimal=decimal_sep, index=False)

And I ended up with a CSV with the prediction results.

The problem is that the cluster takes a long time to start in a Raspberry Pi 2. I read that running the models without a cluster is possible, but I couldn't do it.

I tried to run the models without starting the cluster with no success. I even tried to let a cluster open at the boot, but it didn't work, and every time I run a script to make predictions, the cluster starts and ends when the script finalizes.

Can anyone please help me with that? My models are sorted; I have deep learning, XGboost, StackedEnsemble, GBM, etc.. I have regression, multi-class, and binary classification models, so a flexible way to deal with different models is much appreciated.

Below is the complete code I use to load the models with the h2o cluster.

import pandas as pd
from pandas import read_csv
import h2o

# User defined Variables

# Data file to make predictions
csv_file = "Data.csv"

# CSV parameters
delimiter=";"
decimal_sep=","
encoding="ascii"

# Model zip file
model_mojo="DeepLearning_grid_1_AutoML_1_20231116_204416_model_581.zip"

# Load data
df_prever = read_csv(filepath_or_buffer = csv_file, sep=delimiter, decimal=decimal_sep, encoding=encoding)
df_prever = df_prever.reset_index(drop=True)

#load H2O Model
h2o.init()

# Import the MOJO file
mojo_path = model_mojo
imported_model = h2o.import_mojo(mojo_path)

# Run Model
new_observations = h2o.H2OFrame(df_prever)
predictions = imported_model.predict(new_observations)

# Save Results
predictions = predictions.as_data_frame(predictions)
data_results = pd.concat([df_prever, predictions], axis=1)
data_results = data_results.rename(columns={"predict" : "Predicted_Values"}) 
data_results = data_results.infer_objects()
data_results.to_csv("./Data_Results.csv", sep=delimiter, decimal=decimal_sep, index=False)

# Close H2O
h2o.cluster().shutdown()

I tried to call the model in Java (probably I did it wrong) using the h2o-genmodel.jar and the zip file, but it asks for a "main" file I don't have. I tried to let a cluster open. I read the following documentation:

h2o-docs/productionizing h2o-docs/mojo-capabilities h2o-genmodel/javadoc Tutorial/mojo-resource examples/h2o_mojo

1

There are 1 answers

0
Wendy On

The correct link that you should use is here: https://docs.h2o.ai/h2o/latest-stable/h2o-docs/mojo-quickstart.html

Please follow it to generate your model and save it as a mojo. You will have two files: yourModelMojoXXX.zip and h2o.genmodel.jar.

Do not follow the java step in the link, they only teach you how to generate predict for one row of data at a time which is not what you are looking for. Instead, follow this link: https://github.com/h2oai/h2o-3/issues/15935 and follow the steps listed in the issue. That should allow you to call your mojo model with an input csv file and output the predictions in a csv file without having the h2o-3 cluster running.

I hope this works for you.