I am trying to solve this problem for an UDF I am creating for hiveql environment.
public ObjectInspector initialize(ObjectInspector[] arguments)
throws UDFArgumentException {
if (arguments.length != 1) {
throw new UDFArgumentException("Usage : multiple_prop(primitive var) ");
}
// This will be an string
moi = (PrimitiveObjectInspector) arguments[0];
ArrayList structFieldNames = new ArrayList();
ArrayList structFieldObjectInspectors = new ArrayList();
structFieldNames.add("fields name"); <-- Issue is here
How could I do to get the field name in there? It can be easily done for structObjectInspectors
, but how do we manage this in PrimitiveObjectInspectors
?
Complete code would be this one
public class prop_step2 extends GenericUDF {
private PrimitiveObjectInspector moi;
@Override
public ObjectInspector initialize(ObjectInspector[] arguments)
throws UDFArgumentException {
if (arguments.length != 1) {
throw new UDFArgumentException("Usage : multiple_prop(primitive var) ");
}
// This will be an string
moi = (PrimitiveObjectInspector) arguments[0];
ArrayList structFieldNames = new ArrayList();
ArrayList structFieldObjectInspectors = new ArrayList();
// Change this to get the input variable name, and not the type name
structFieldNames.add(moi.getTypeName());<-- Change this to field name
structFieldObjectInspectors.add( PrimitiveObjectInspectorFactory.writableStringObjectInspector );
return ObjectInspectorFactory.getStandardStructObjectInspector(structFieldNames, structFieldObjectInspectors);
}
@Override
public Object evaluate(DeferredObject[] arguments) throws HiveException {
Object[] result;
result = new Object[1];
Text elem1 = new Text((String) moi.getPrimitiveJavaObject(arguments[0].get()));
result[0]= elem1;
return result;
}
@Override
public String getDisplayString(String[] children) {
return "stop";
}}
When this would be finished, i would like to call this udf from hive:
CREATE TEMPORARY FUNCTION step AS 'UDFpack.prop_step2';
select
step(bit) as sd
from my_table
And i would expect that if in an upper select i did this : sd.bit i would obtain the value of 'bit'.
It's simply not possible. The information passed to the UDF - the ObjectInspectors - do not contain their name. That's why you can see the output column names being changed to _col0, _col1 .. in the intermediary stages of a Hive explain plan. I am also quite annoyed by this and think this is an oversight by Hive.
A workaround would be to put your input into a struct and parse that.
i.e step(named_struct('bit',bit)) and then you can get the field name of the struct in your UDF. But it's not nearly as nice