File writing from multiple threads.

2.1k views Asked by At

I have an application A which calls another application B which does some calculation and writes to a file File.txt A invokes multiple instances of B through multiple threads and each instances tries to write to same file File.txt Here comes the actual problem : Since multiple threads tries to access the same file , the file access throws out which is common.

I tried an approach of using a concurrent queue in a singleton class and each instances of B adds to the queue And another thread in this class takes care of dequeing the items from queue and writes to the file File.txt. The queue is fetched synchronously and write operation succeeded . This works fine .

If I have too many threads and too many items in queue the file writing works but if for some reason my queue crashes or stops abruptly all the information which is supposed to be written to file is lost .

If I make the file writing synchronous from the B without using the queue then it will be slow as it needs to check for file locking but here there are less chances of data being missed as after B immediately writes to file.

What could be there best approach or design to handle this scenario? I don't need the response after file writing is completed . I can't make B wait for the file writing to be completed.

Would async await file writing could be of any use here ?

2

There are 2 answers

1
duffymo On

I think what you've done is the best that can be done. You may have to tune your producer/consumer queue solution if there are still problems, but it seems to me that you've done rather well with this approach.

If an in-memory queue isn't the answer, perhaps externalizing that to a message queue and a pool of listeners would be an improvement.

Relational databases and transaction managers are born to solve this problem. Why continue with a file based solution? Is it possible to explore an alternative?

3
shay__ On

is there a better approach or design to handle this scenario?

You can make each producer thread write to it's own rolling file instead of queuing the operation. Every X seconds the producers move to new files and some aggregation thread wakes up, read the previous files (of each producer) and writes the results to the final File.txt output file. No read / write locks are required here.

This ensures safe recovery since the rolling files exist until you process and delete them.

This also mean that you always write to disk, which is much slower than queuing tasks in memory and write to disk in bulks. But that's the price you pay for consistency.

Would async await file writing could be of any use here ?

Using asynchronous IO has nothing to do with this. The problems you mentioned were 1) shared resources (the output file) and 2) lack of consistency (when the queue crash), none of which async programming is about.

Why the async is in picture is because I dont want to delay the existing work by B because of this file writing operation

async would indeed help you with that. Whatever pattern you choose to implement (to solve the original problem) it can always be async by merely using the asynchronous IO api's.