Java IO: Reading a file that is still being written

741 views Asked by At

I am creating a program which needs to read from a file that is still being written.

The main question is this: If the read and write will be performed using InputStream and OutputStream classes running on a separate thread, what are the catches and edge cases that I will need to be aware of in order to prevent data corruption?

In case anyone is wondering if I have considered other, non-InputStream based approach, the answer is yes, I have but unfortunately it's not possible in this project since the program uses libraries that only works with InputStream and OutputStream.

Also, several readers have asked why this complications is necessary. Why not perform reading after the file has been written completely?

The reason is efficiency. The program will perform the following

  1. Download a series of byte chunks of 1.5MB each. The program will receive thousands of such chunks that can total up to 30GB. Also, chunks are downloaded concurrently in order to maximize bandwidth, so they may arrive out of order.
  2. The program will send each chunk for processing as soon as they have arrived. Please note that they will be sent for processing in order. If chunk m arrives before chunk m-1 does, they will be buffered on disk until chunk m-1 arrives and is sent for processing.
  3. perform processing of these chunks starting from chunk 0 up to chunk n until every chunks has been processed
  4. Resend the processed result back.

If we are to wait for the whole file to be transferred, it will introduce a huge delay on what is supposed to be a real-time system.

3

There are 3 answers

2
Kayaman On BEST ANSWER

So your problem (as you've cleared it up now) is that you can't start processing until chunk#1 has arrived, and you need to buffer every chunk#N (N > 1) until you can process them.

I would write each chunk to their own file and create a custom InputStream that will read every chunk in order. While downloading the chunkfile would be named something like chunk.1.downloading and when the whole chunk is loaded it will be renamed to chunk.1.

The custom InputStream will check to see if file chunk.N exists (where N = 1...X). If not, it will block. Each time a chunk has been downloaded completely, the InputStream is notified, it will check if the downloaded chunk was the next one to be processed. If yes, read as normally, otherwise block again.

0
AudioBubble On

You should use PipedInputStream and PipedOutputStream:

static Thread newCopyThread(InputStream is, OutputStream os) {
    Thread t = new Thread() {
        @Override
        public void run() {
            byte[] buffer = new byte[2048];
            try {
                while (true) {
                    int size = is.read(buffer);
                    if (size < 0) break;
                    os.write(buffer, 0, size);
                }
                is.close();
                os.close();
            } catch (IOException e) {
                e.printStackTrace();
            } finally {
            }
        }
    };
    return t;
}

public void main(String[] args) throws IOException, InterruptedException {
    ByteArrayInputStream bi = new ByteArrayInputStream("abcdefg".getBytes());
    PipedInputStream is = new PipedInputStream();
    PipedOutputStream os = new PipedOutputStream(is);
    Thread p = newCopyThread(bi, os);
    Thread c = newCopyThread(is, System.out);
    p.start();
    c.start();
    p.join();
    c.join();
}
2
Joop Eggen On

Use a RandomAccessFile. Via a getChannel or such one could use a ByteBuffer.

You will not be able to "insert" or "delete" middle parts of the file. For such a purpose your original approach would be fine, but using two files.

For concurrency: to keep in synch you could maintain one single object model of the file, do changes there. Only the pending changes need to be kept in memory, other hierarchical data could be reread and reparsed as needed.