Merging columns (h2o.merge) using H2O in SparkR

959 views Asked by At

In my current project, I am using H2O machine learning library in SparkR. I have multiple .csv files and reading these .csv files through h2o data frame. Now, I would like to apply h2o.merge() function over the files to map primary key of one h2o data frame with the foreign key of another h2o data frame. My main h2o data frame contains 14 columns. I get data types of all the columns using h2o.getTypes() function.

In order to apply h2o.merge() function, the column should be of type string or numeric instead of enum or real. So to convert data type of columns, I am using h2o.ascharacter() and h2o.asfactor() functions. Now, I have converted enum columns to string columns to use h2o.merge() functions. When I used h2o.merge() function it displays following error: SparkR console Am I missing anything ? I have captured the syntax to use h2o.merge() function from this link Syntax of h2o.merge function. How to merge h2o data frames? Sample data set of factTable h2o data frame is shown below (SALES_ORG is a primary key): sample factTable data set Sample data set of regionTable h2o data frame is shown below (SALES_ORG is a foreign key): Sample regionTable data set

1

There are 1 answers

0
Saurabh Chauhan On BEST ANSWER

Finally, I figure out the answers using the hint from the comment. The basic thing is we need to convert column into factor/enum before applying merge operation. The data type of the column having either primary key or foreign key should be factor/enum.