writing/reading key/value pairs in sequence file format in Hadoop.

Question

writing/reading key/value pairs in sequence file format in Hadoop.

2.1k views Asked by user2654569 At 15 December 2013 at 19:57

I have a mapreduce program whose output is all in text files right now. A sample of the program is below. What I do not understand how to do is output the key/value pairs from the reducer in sequence file format. No, I can't use SequeceFileFormat specifier because I'm using the hadoop 0.20 library

So what do I do? Below is a sample The wordcount program is just one small part of my larger program. If I know how to do it w/ one, I can do it with the rest. Please help. Word Count Reducer

public void reduce(Text key, Iterable<IntWritable> values, Context context) 
  throws IOException, InterruptedException 
  {
    int sum = 0;
    for (IntWritable val : values) {
        sum += val.get();
    }
    System.out.println("reducer.output: "+key.toString()+" "+sum);

    context.write(key, new IntWritable(sum)); **//RIGHT HERE!! OUTPUTS TO TEXT**

}

}

Now here is the main program that runs this (I left out the mapper and other irrelevant details)

Configuration conf = new Configuration();

Job job = new Job(conf, "Terms");
job.setJarByClass(wordCount.class);

//Outputting key value pairs as a dictionary (rememb python)
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

//Setting the mapper and reducer classes
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);


//Setting the type of input format. In this case, plain TEXT
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

I know how to convert a text file to a sequence file. I know how to do the opposite. That isn't the issue here. I couldn't find any example of actually doing this in a hadoop program which is why I am stuck.

So the output that I want is for this program to write the key/value pairs in a sequence file instead of a text file

I also want to know how to read IN a sequence file with the Mapper

Any help would be greatly appreciated.

Original Q&A

There are 1 answers

**Daniel Langdon** · Answer 1 · 2014-04-10T17:12:30+00:00

I believe it suffices to change input and output formats. Key/value pairs should be the same once things are encoded/decoded correctly. So use:

import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;

&

job.setInputFormatClass(SequenceFileInputFormat.class);
job.setOutputFormatClass(SequenceFileOutputFormat.class);

Give it a try, as I have not done this in a while...

TechQA.

writing/reading key/value pairs in sequence file format in Hadoop.

There are 1 answers

Related Questions in JAVA

Related Questions in HADOOP

Related Questions in SEQUENCEFILE

Popular Questions

Popular Tags

Trending Questions