How to optimize C# concurrent file write performance

1.5k views Asked by At

I'm trying to optimize the performance of creating a lot of small files to a SSD disk.

ConcurrentBag<string[]> cb = new ConcurrentBag<string[]>();
cb.AsParallel().ForAll(fa => File.WriteAllText(fa[0], fa[1]));

Total count of the ConcurrentBag<string[]> = 80048, cb.Sum(gbc => Encoding.UTF8.GetByteCount( gbc[1] ) ); returns 393441217 bytes.

Somewhere else I do a xml.Save();, which creates a ~750MB file.

The first situation takes 3 minutes and 30 seconds to complete. The second 20 seconds.

I understand there is some overhead to handle all the seperate write operations but 3 minutes and 30 seconds still seems a bit long. I already tried parallelization with forall, which helped pretty good (before that it took between 6-8 minutes to complete). What other modifications could I add to my code to optimize performance of the bulk file creation?

2

There are 2 answers

2
Daniel Luberda On BEST ANSWER

Actually, multiple simultaneous IO operations can slow things down quite a lot, especially on traditional disks. I recommend using ConcurrentQueue for writing multiple files.

Also you could switch to StreamWriter and control buffer size to increase write speed:

    ConcurrentQueue<string[]> concurrentQueue = new ConcurrentQueue<string[]>();

    // populate with some data
    for (int i = 0; i < 5000; i++)
    {
        concurrentQueue.Enqueue(new string[] { Guid.NewGuid().ToString(), Guid.NewGuid().ToString() });
    }

    while (true)
    {
        string[] currentElement;
        bool success = concurrentQueue.TryDequeue(out currentElement);
        if (success)
        {
            const int BufferSize = 65536;  // change it to your needs
            using (var sw = new StreamWriter(currentElement[0], true, Encoding.UTF8, BufferSize))
            {
                sw.Write(currentElement[1]);
            }
        }
    }
0
silver On

you should also try to use ForEach instead of the ForAll. you can find some good reasons in the post http://reedcopsey.com/2010/02/03/parallelism-in-net-part-8-plinqs-forall-method/

the post guideline is

The ForAll extension method should only be used to process the results of a parallel query, as returned by a PLINQ expression