How does RecordReader send data to mapper in Hadoop

206 views Asked by At

I'm new to Hadoop and currently I'm learning mapreduce design pattern from Donald Miner & Adam Shook MapReduce Design Pattern book. So in this book there is Cartesian Product Pattern. My question is:

  1. When does record reader send data to mapper?
  2. Where is the code that send the data to mapper?

What I see is next function in CartesianRecordReader class read both split without sending the data.

Here is the source code https://github.com/adamjshook/mapreducepatterns/blob/master/MRDP/src/main/java/mrdp/ch5/CartesianProduct.java

That's all, thanks in advance :)

1

There are 1 answers

1
Vicente Bolea On BEST ANSWER

When does record reader send data to mapper?

Let me answer by giving you an idea how how the mapper and the RecordReader are related. This is the Hadoop code that sends data to the mapper. 1

  RecordReader<K1, V1> input;

  K1 key = input.createKey();
  V1 value = input.createValue();

  while (input.next(key, value)) {
    // map pair to output
    mapper.map(key, value, output, reporter);
    if(incrProcCount) {
      reporter.incrCounter(SkipBadRecords.COUNTER_GROUP, 
          SkipBadRecords.COUNTER_MAP_PROCESSED_RECORDS, 1);
    }
  }

Basically, the Hadoop will call next until it returns false, and at every call key and value will obtain new values. Key being normally the bytes read so far and value the next line in the file.

Where is the code that send the data to mapper?

That code is at the source code of hadoop (Probably at the MapContextImpl class) but it resembles what I have wrote in the code snippet.

EDIT : The source code is at MapRunner.