How to get the field name of a primitive object inspector in java

606 views Asked by At

I am trying to solve this problem for an UDF I am creating for hiveql environment.

public ObjectInspector initialize(ObjectInspector[] arguments)
            throws UDFArgumentException {
    if (arguments.length != 1) {
        throw new UDFArgumentException("Usage : multiple_prop(primitive var) ");
    }
    // This will be an string
    moi = (PrimitiveObjectInspector) arguments[0];

    ArrayList structFieldNames = new ArrayList();
    ArrayList structFieldObjectInspectors = new ArrayList();

    structFieldNames.add("fields name"); <-- Issue is here

How could I do to get the field name in there? It can be easily done for structObjectInspectors, but how do we manage this in PrimitiveObjectInspectors?

Complete code would be this one

public class prop_step2 extends GenericUDF {
    private PrimitiveObjectInspector moi;
    @Override
    public ObjectInspector initialize(ObjectInspector[] arguments)
            throws UDFArgumentException {
        if (arguments.length != 1) {
            throw new UDFArgumentException("Usage : multiple_prop(primitive var) ");
        }
        // This will be an string
        moi = (PrimitiveObjectInspector) arguments[0];

        ArrayList structFieldNames = new ArrayList();
        ArrayList structFieldObjectInspectors = new ArrayList();
        // Change this to get the input variable name, and not the type name
        structFieldNames.add(moi.getTypeName());<-- Change this to field name
        structFieldObjectInspectors.add( PrimitiveObjectInspectorFactory.writableStringObjectInspector );

       return ObjectInspectorFactory.getStandardStructObjectInspector(structFieldNames, structFieldObjectInspectors);
    }

    @Override
    public Object evaluate(DeferredObject[] arguments) throws HiveException {
        Object[] result;
        result = new Object[1];
        Text elem1 = new Text((String) moi.getPrimitiveJavaObject(arguments[0].get()));
        result[0]= elem1;
        return result;
    }
    @Override
    public String getDisplayString(String[] children) {
        return "stop";
    }}

When this would be finished, i would like to call this udf from hive:

CREATE TEMPORARY FUNCTION step AS 'UDFpack.prop_step2';
select 
step(bit) as sd
from my_table

And i would expect that if in an upper select i did this : sd.bit i would obtain the value of 'bit'.

1

There are 1 answers

0
Clemens Valiente On

It's simply not possible. The information passed to the UDF - the ObjectInspectors - do not contain their name. That's why you can see the output column names being changed to _col0, _col1 .. in the intermediary stages of a Hive explain plan. I am also quite annoyed by this and think this is an oversight by Hive.

A workaround would be to put your input into a struct and parse that.

i.e step(named_struct('bit',bit)) and then you can get the field name of the struct in your UDF. But it's not nearly as nice