What I am trying to is to convert a sequence file on hdfs which has xml data into .xml files on hdfs.

Searched on Google and found the below code. I made modifications according to my need and the following is the code..

public class SeqFileWriterCls {
    public static void main(String args[]) throws Exception {
        System.out.println("Reading Sequence File");
        Path path = new Path("seq_file_path/seq_file.seq");
        Configuration conf = new Configuration();
        FileSystem fs = FileSystem.get(conf);
        SequenceFile.Writer writer = null;
        SequenceFile.Reader reader = null;
        FSDataOutputStream fwriter = null;
        OutputStream fowriter = null;
        try {
            reader = new SequenceFile.Reader(fs, path, conf);
            //writer = new SequenceFile.Writer(fs, conf,out_path,Text.class,Text.class);
            Writable key = (Writable) ReflectionUtils.newInstance(reader.getKeyClass(), conf);

            Writable value = (Writable) ReflectionUtils.newInstance(reader.getValueClass(), conf);

            while (reader.next(key, value)) {
            //i am just editing the path in such a way that key will be my filename and data in it will be the value
                Path out_path = new Path(""+key);
                String string_path = out_path.toString();
                String clear_path=string_path.substring(string_path.lastIndexOf("/")+1);

                Path finalout_path = new Path("path"+clear_path);
                System.out.println("the final path is "+finalout_path);
                fwriter = fs.create(finalout_path);
                fwriter.writeUTF(value.toString());
                fwriter.close();
                FSDataInputStream in = fs.open(finalout_path);
                String s = in.readUTF();
                System.out.println("file has: -" + s);
                //fowriter = fs.create(finalout_path); 
                //fowriter.write(value.toString());
                System.out.println(key + "  <===>  :" + value.toString());
                System.exit(0);
            }
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            IOUtils.closeStream(reader);
            fs.close();
        }

}

I am using "FSDataOutputStream" to write the data to HDFS and the method is used is "writeUTF" The issue is that when i write to the hdfs file some additional characters are getting in the starting of data. But when i print the data i couldnt see the extra characters.

i tried using writeChars() but even taht wont work.

is there any way to avoid this?? or is there any other way to write the data to HDFS???

please help...

1

There are 1 answers

0
Adam Berkecz On

The JavaDoc of the writeUTF(String str) method says the followings:

Writes a string to the underlying output stream using modified UTF-8 encoding in a machine-independent manner. First, two bytes are written to the output stream as if by the writeShort method giving the number of bytes to follow. This value is the number of bytes actually written out, not the length of the string. Following the length, each character of the string is output, in sequence, using the modified UTF-8 encoding for the character. (...)

Both the writeBytes(String str) and writeChars(String str) methods should work fine.