I have a mapreduce program whose output is all in text files right now. A sample of the program is below. What I do not understand how to do is output the key/value pairs from the reducer in sequence file format. No, I can't use SequeceFileFormat specifier because I'm using the hadoop 0.20 library
So what do I do? Below is a sample The wordcount program is just one small part of my larger program. If I know how to do it w/ one, I can do it with the rest. Please help. Word Count Reducer
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException
{
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
System.out.println("reducer.output: "+key.toString()+" "+sum);
context.write(key, new IntWritable(sum)); **//RIGHT HERE!! OUTPUTS TO TEXT**
}
}
Now here is the main program that runs this (I left out the mapper and other irrelevant details)
Configuration conf = new Configuration();
Job job = new Job(conf, "Terms");
job.setJarByClass(wordCount.class);
//Outputting key value pairs as a dictionary (rememb python)
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
//Setting the mapper and reducer classes
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
//Setting the type of input format. In this case, plain TEXT
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
I know how to convert a text file to a sequence file. I know how to do the opposite. That isn't the issue here. I couldn't find any example of actually doing this in a hadoop program which is why I am stuck.
So the output that I want is for this program to write the key/value pairs in a sequence file instead of a text file
I also want to know how to read IN a sequence file with the Mapper
Any help would be greatly appreciated.
I believe it suffices to change input and output formats. Key/value pairs should be the same once things are encoded/decoded correctly. So use:
&
Give it a try, as I have not done this in a while...