Append to existing sequence file

1.1k views Asked by At

Can someone please provide a sample code snippet for how to append a file into an existing sequence file?

Below is the code which I used to append to an existing sequence file outputfile but while reading the sequence file after append it is throwing checksum error:

Problem opening checksum file: /Users/{homedirectory}/Desktop/Sample/SequenceFile/outputfile. Ignoring exception: java.io.EOFException

public class AppendSequenceFile {

    /**
     * @param args
     * @throws IOException
     * @throws IllegalAccessException
     * @throws InstantiationException
     */
    public static void main(String[] args) throws IOException,
            InstantiationException, IllegalAccessException {

        Configuration conf = new Configuration();

        FileSystem fs = FileSystem.get(conf);
        Path inputFile = new Path("/Users/{homedirectory}/Desktop/Sample/SequenceFile/sampleAppendTextFiles");
        Path sequenceFile = new Path("/Users/{homedirectory}/Desktop/Sample/SequenceFile/outputfile");
        FSDataInputStream inputStream;
        Text key = new Text();
        Text value = new Text();
        SequenceFile.Writer writer = SequenceFile.createWriter(fs, conf,
                sequenceFile, key.getClass(), value.getClass());
        FileStatus[] fStatus = fs.listStatus(inputFile);

        for (FileStatus fst : fStatus) {
            String str = "";
            System.out.println("Processing file : " + fst.getPath().getName() + " and the size is : " + fst.getPath().getName().length());
            inputStream = fs.open(fst.getPath());
            key.set(fst.getPath().getName());
            while(inputStream.available()>0) {
                str = str+inputStream.readLine();
            }
            value.set(str);
            writer.append(key, value);

        }
    }
}

Sequence file reader:

public class SequenceFileReader{
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        FileSystem fs = FileSystem.get(conf);
        Path path = new Path("/Users/{homedirectory}/Desktop/Sample/SequenceFile/outputfile");
        SequenceFile.Reader reader = null;
        try {
            reader = new SequenceFile.Reader(fs, path, conf);
            Text key = new Text();
            Text value = new Text();
            while (reader.next(key, value)) { System.out.println(key);
            System.out.println(value);
            }
        } finally {
            IOUtils.closeStream(reader);
        }
    }
}

Thanks in advance.

2

There are 2 answers

2
Amit On

I have not done this myself, however browsing the Hadoop API documentation I found this.

You can use this API to create the Writer. Please refer SequenceFile

public static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileContext fc,Configuration conf,Path name,Class keyClass,Class valClass,org.apache.hadoop.io.SequenceFile.CompressionType compressionType,CompressionCodec codec,org.apache.hadoop.io.SequenceFile.Metadata metadata,EnumSet<CreateFlag> createFlag,org.apache.hadoop.fs.Options.CreateOpts... opts) throws IOException

In this API, CreateFlag can help you specify "APPEND" option.

0
user12105468 On

Try closing the sequencefile writer object used for appending file before using sequencefile reader object for reading from file.