I have to implement a Graph algorithm using Map Reduce. For this I have to chain jobs.
MAP1 -> REDUCE1 -> MAP2 -> REDUCE2 -> ...
I will be reading the adjacent matrix from file in MAP1 and creating a user defined java class Node
that will contain the data and the child informations. I want to pass this information to MAP2.
But, in the REDUCE1 when I write
context.write(node, NullWritable.get());
the node data gets saved in a file as a text format using the toString()
of the Node class.
When the MAP2 tries to read this Node information,
public void map(LongWritable key, Node node, Context context) throws IOException, InterruptedException
it says that it cannot convert the text in the file to Node.
I am not sure what is the right approach for this type of Chaining of jobs in Map reduce.
The REDUCE1 writes the Node in this format:
Node [nodeId=1, adjacentNodes=[Node [nodeId=2, adjacentNodes=[]], Node [nodeId=2, adjacentNodes=[]]]]
Actual exception:
java.lang.Exception: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to custom.node.nauty.Node
Based on the comments, the suggested changes that will make your code work are the following:
You should use SequenceFileInputFormat in mapper2 and SequenceFileOutputFormat in reducer1, and not TextInputFormat and TextOutputFormat, respectively. TextInputFormat reads a LongWritable key and a Text value, which is why you get this error.
Accordingly, you should also change the declaration of mapper two, to accept a Node key and a NullWritable value.
Make sure that the Node class extends the Writable class (or the WritableComparable if you use it as a key). Then, set the outputKeyClass of the first job to be Node.class, instead of TextWritable.class.