I have an input file on which I need to customize the RecordReader
. But, the problem here is, the data may get distributed across different input splits and different mapper may get the data which should be consumed by the first mapper.
For e.g.
A B C D
$ E F
That '$' at the beginning signifies that, it is the continuation of the previous line.
Considering, the second split starts from $. Now, my first mapper won't know that there is something in continuation of first line. Please, also note that there is a very good chance that I do not have any second line in my data at all. So, I won't be able to tell that there is something in continuation of my data until or unless I read the second line.
Please help me find a solution for this problem.