Hello when attempting to use hive-malls regression tool kit I run into errors when attempting to build the feature representation.

I've been following this guide https://hivemall.incubator.apache.org/userguide/supervised_learning/tutorial.html and have been attempting to reproduce it. I've used the code provided however I'm running into issues when running.

My issue seems to be with this part of the guide

create table if not exists purchase_history as
select 1 as id, "Saturday" as day_of_week, "male" as gender, 600 as price, "book" as category, 1 as label
union all
select 2 as id, "Friday" as day_of_week, "female" as gender, 4800 as price, "sports" as category, 0 as label
union all
select 3 as id, "Friday" as day_of_week, "other" as gender, 18000 as price, "entertainment" as category, 0 as label
union all
select 4 as id, "Thursday" as day_of_week, "male" as gender, 200 as price, "food" as category, 0 as label
union all
select 5 as id, "Wednesday" as day_of_week, "female" as gender, 1000 as price, "electronics" as category, 1 as label
;

create table if not exists training as
select
id,
array_concat( – concatenate two arrays of quantitative and categorical features into single array
quantitative_features(
array("price"), – quantitative feature names
price – corresponding column names
),
categorical_features(
array("day of week", "gender", "category"), – categorical feature names
day_of_week, gender, category – corresponding column names
)
) as features,
label
from
purchase_history
;

This is copied straight from the guide. https://hivemall.incubator.apache.org/userguide/supervised_learning/tutorial.html

When running I am getting this error

at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:211)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating id
    at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:149)
    at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:966)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:939)
    at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
    at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.closeOp(VectorMapOperator.java:990)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:733)
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:193)
    ... 9 more
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.exec.UDFArgumentException: argument must be a constant value: array<string>
    at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:106)
    at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:271)
    at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:111)
    at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
    ... 15 more
Caused by: org.apache.hadoop.hive.ql.exec.UDFArgumentException: argument must be a constant value: array<string>
    at hivemall.utils.hadoop.HiveUtils.getConstStringArray(HiveUtils.java:502)
    at hivemall.ftvec.trans.QuantitativeFeaturesUDF.initialize(QuantitativeFeaturesUDF.java:80)
    at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.init(VectorUDFAdaptor.java:89)
    at org.apache.hadoop.hive.ql.exec.vector.udf.VectorUDFAdaptor.evaluate(VectorUDFAdaptor.java:104)
    ... 18 more

However when I run the same query alone, without creating a table, I get the right results.

1    "price:600.0","day of week#Saturday","gender#male","category#book"    1
2    "price:4800.0","day of week#Friday","gender#female","category#sports"0
3    "price:18000.0","day of week#Friday","gender#other","category#entertainment"    0
4    "price:200.0","day of week#Thursday","gender#male","category#food"    0
5    "price:1000.0","day of week#Wednesday","gender#female","category#electronics"    1

Any idea why I am not able to save this information in a table?

1 Answers

0
leftjoin On

Try to create table without union all:

create table if not exists purchase_history as

select id,day_of_week,gender,price,category,label
from
(
select stack(5,
1, "Saturday" , "male"   , 600  , "book"          , 1,
2, "Friday"   , "female" , 4800 , "sports"        , 0,
3, "Friday"   , "other"  , 18000, "entertainment" , 0,
4, "Thursday" , "male"   , 200  , "food"          , 0,
5, "Wednesday", "female" , 1000 , "electronics"   , 1
) as ( id,day_of_week,gender,price,category,label)
)s;