Hadoop: When does the setup method gets invoked in reducer?

317 views Asked by At

As far as I understand, the reduce task has three phases.

Shuffle, Sort and actual reduce invocation.

So usually in hadoop job's output we see something like, map 0% reduce 0% map 20% reduce 0% . . . map 90% reduce 10% . . .

So I assume that the reduce tasks start before all the maps are finished and this behavior is controlled by the slow start configuration.

Now I don't yet understand when does the setup method of the reducer is actually called.

In my use case, I have some files to parse in the setup method. The file is about 60MB in size and is picked up from the distributed cache. While the file is being parsed, there is another set of data from configuration that can update the just parsed record. After parsing and possible updation, the file is stored in a HashMap for fast lookups. So I would like this method to be invoked as soon as possible, possibly while the mappers are still doing their thing.

Is it possible to do this? Or is that what already happens?

Thanks

2

There are 2 answers

2
Thomas Jungblut On BEST ANSWER

Setup is called right before it is able to read the first key/values pair from the stream.

Which is effectively after all mappers ran and all the merging for a given reducer partition is finished.

0
Manu Manjunath On

As explained in Hadoop docs, setup() method is called once at the start of the task. It should be used for the instantiating resources/variables or reading configurable params which in turn can be used in reduce() method. Think of it like a constructor.

Here is an example reducer:

class ExampleReducer extends TableReducer<ImmutableBytesWritable, ImmutableBytesWritable, ImmutableBytesWritable> {

    private int runId;
    private ObjectMapper objectMapper;

    @Override
    protected void setup(Context context) throws IOException {
        Configuration conf = context.getConfiguration();
        this.runId = Integer.valueOf(conf.get("stackoverflow_run_id"));
        this.objectMapper = new ObjectMapper();
    }


    @Override
    protected void reduce(ImmutableBytesWritable keyFromMap, Iterable<ImmutableBytesWritable> valuesFromMap, Context context) throws IOException, InterruptedException {
        // your code
        var = objectMapper.writeValueAsString();
        // your code
        context.write(new ImmutableBytesWritable(somekey.getBytes()), put);
    }
}