How can I read a specific number of bytes from a FileInputStream object using buffers

736 views Asked by At

I have a series of objects stored within a file concatenated as below:

sizeOfFile1 || file1 || sizeOfFile2 || file2 ...

The size of the files are serialized long objects and the files are just the raw bytes of the files.

I am trying to extract the files from the input file. Below is my code:

FileInputStream fileInputStream = new FileInputStream("C:\Test.tst");
ObjectInputStream objectInputStream = new ObjectInputStream(fileInputStream);
while (fileInputStream.available() > 0)
{
  long size = (long) objectInputStream.readObject();
  FileOutputStream fileOutputStream = new FileOutputStream("C:\" + size + ".tst");
  BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(fileOutputStream);
  int chunkSize = 256;
  final byte[] temp = new byte[chunkSize];
  int finalChunkSize = (int) (size % chunkSize);
  final byte[] finalTemp = new byte[finalChunkSize];
  while(fileInputStream.available() > 0 && size > 0)
  {
    if (fileInputStream.available() > finalChunkSize)
    {
      int i = fileInputStream.read(temp);
      secBufferedOutputStream.write(temp, 0, i);
      size = size - i;
    }
    else
    {
      int i = fileInputStream.read(finalTemp);
      secBufferedOutputStream.write(finalTemp, 0, i);
      size = 0;
    }
  }
  bufferedOutputStream.close();
}
fileOutputStream.close();

My code fails after it reads the first sizeOfFile; it just reads the rest of the input file into one file when there are multiple files stored.

Can anyone see the issue here?

Regards.

4

There are 4 answers

1
user207421 On

Wrap it in a DataInputStream and use readFully(byte[]).

But I question the design. Serialization and random access do not mix. It sounds like you should be using a database.

NB you are misusing available(). See the method's Javadoc page. It is never correct to use it as a count of the total number of bytes in the stream. There are few if any correct uses of available(), and this isn't one of them.

4
Software Engineer On

you could try NIO instead...

FileChannel roChannel = new RandomAccessFile(file, "r").getChannel();
ByteBuffer roBuf = roChannel.map(FileChannel.MapMode.READ_ONLY, 0, SIZE);

This reads only SIZE bytes from the file.

B

0
msknapp On

Save yourself a lot of trouble by doing one of these things:

  1. Switch to using Avro, trust me you would be crazy not to. It's easy to learn, and will accomodate schema changes. Using ObjectXXXStream is one of the worst ideas ever, as soon as you change your schema your old files are garbage.
  2. or use Thrift
  3. or use Hibernate (but this is probably not a great option, hibernate takes a lot of time to learn, and takes a lot of configuration)

If you really refuse to switch to avro, I recommend reading up on apache's IOUtils class. It has a method to copy from one input stream to another, saving you a lot of headaches. Unfortunately what you want to do is a little more complicated, you want the size prefixing each file. You might be able to use a combination of SequenceInputStream objects to do that.

There is also GzipOutputStream and ZipOutputStream, but I think those require some other jars added to your classpath too.

I'm not going to write an example because I honestly think you should just learn avro or thrift and use that.

0
eckes On

This is using DataInput to read longs. In this particular case I am not using readFully() as a segment might be too long to keep it in memory:

DataInputStream in = new DataInputStream(FileInputStream());
byte[] buf = new byte[64*1024];
while(true) {
  OutputStream out = ...;
  long size;
  try { size = in.readLong(); } catch (EOFException e) { break; } 
  while(size > 0) {
    int len = (size > buf.length)?buf.length:size;
    len = in.read(buf, 0, len);
    out.write(buf, 0, len);
    size-=len;
  }
  out.close();
}