splits in map reduce jobs

88 views Asked by At

I have an input file on which I need to customize the RecordReader. But, the problem here is, the data may get distributed across different input splits and different mapper may get the data which should be consumed by the first mapper.

For e.g.
A B C D
$ E F

That '$' at the beginning signifies that, it is the continuation of the previous line.

Considering, the second split starts from $. Now, my first mapper won't know that there is something in continuation of first line. Please, also note that there is a very good chance that I do not have any second line in my data at all. So, I won't be able to tell that there is something in continuation of my data until or unless I read the second line.

Please help me find a solution for this problem.

0

There are 0 answers