MapReduce basics

30 views Asked by At

I have a text file of 300mb with block size of 128mb. So total 3 blocks 128+128+44 mb would be created. Correct me - For map reduce default input split is same as block size that is 128mb which can be configured. Now record reader will read through each split and create key value pair were key is offset and value is single line. (TextInputFormat) Question is if at last line of my block the block ends but the line does end in another block, will the rest of the line be taken from different node or will the remaining line run in another node. Also how will the second node understand that its 1st line is already taken for processing and it dont need to process again.

Eg This is stackoverflow.This (end of block 1/input split) is a map reduce example. (end of line)

1

There are 1 answers

0
Manish Pansari On

3 mapper will be generated for this scenario. Hadoop uses a pointer at the end of every block which indicates the location of next block , so mapper 1 will processed the complete line , which may be the part of block 2 and mapper 2 will start processing by leaving that line.