Issues with a UDF

66 views Asked by At

I have a UDF that accepts a bag as input and converts it to a map. Each key of the map consists of the distinct elements in the bag and the values corresponding to their count

But it's failing the junit tests

2

There are 2 answers

0
glefait On

It seems you except to use a bag of two tuples, but you are indeed creating a bag that contains a tuple with two fields.

This code :

DataBag dataBag = bagFactory.newDefaultBag();
Tuple nestedTuple = tupleFactory.newTuple(2);
nestedTuple.set(0, "12345");
nestedTuple.set(1, "12345");
dataBag.add(nestedTuple);

should be transformed to :

DataBag dataBag = bagFactory.newDefaultBag();
Tuple tupleA = tupleFactory.newTuple(1);
tupleA.set(0, "12345");
dataBag.add(tupleA);

Tuple tupleB = tupleFactory.newTuple(1);
tupleB.set(0, "12345");
dataBag.add(tupleB);

Or you may iterate on all your tuple's fields.

0
Balduz On

The output of 1 is correct: in your UDF you are counting the number of tuples that have the same value for the first field, but in the test you are adding only one tuple with two values.

If what you want is to count the number of tuples with the same value as "key" (where key is the first value in your tuple), then what you are doing is correct, but you would have to change your test:

public void testExecWithSimpleMap() throws Exception {
    Tuple inputTuple = tupleFactory.newTuple(1);
    DataBag dataBag = bagFactory.newDefaultBag();
    Tuple nestedTuple = tupleFactory.newTuple(2);
    nestedTuple.set(0, "12345");
    nestedTuple.set(1, "another value");
    dataBag.add(nestedTuple);

    // Add a second tuple
    nestedTuple.set(0, "12345");
    nestedTuple.set(1, "and another value");
    dataBag.add(nestedTuple);
    inputTuple.set(0,dataBag);
    Map output = testClass.exec(inputTuple);
    assertEquals(output.size(), 1);
    System.out.println(output.get("12345"));
    assertEquals(output.get("12345"),2);
}

However, if what you wanted was to count how many times a value is repeated in the whole Bag, whether it's on different Tuples or on the same Tuple, (it is not very clear in your question), then you need to change your UDF:

public class BagToMap extends EvalFunc<Map> {
    public Map exec(Tuple input) throws IOException {
        if(input == null) {
            return null;
        }
        DataBag values = (DataBag)input.get(0);
        Map<String, Integer> m = new HashMap<String, Integer>();
        for (Iterator<Tuple> it = values.iterator(); it.hasNext();) {
            Tuple t = it.next();

            // Iterate through the Tuple as well
            for (Iterator<Object> it2 = t.iterator(); it2.hasNext();) {
                Object o = it2.next();
                String key = o.toString();

                if(m.containsKey(key)) {
                    m.put(key, m.get(key)+1);
                } else {
                    m.put(key, 1);
                }
            }
        }
        return m;
    }
}

In this case, your test should pass.