Multiple Threads searching on same folder at same time

2.8k views Asked by At

Currently I have a .txt file of about 170,000 jpg file names and I read them all into a List (fileNames).

I want to search ONE folder (this folder has sub-folders) to check if each file in fileNames exists in this folder and if it does, copy it to a new folder.

I was making a rough estimate but each search and copy for each file name in fileNames takes about .5 seconds. So 170,000 seconds is roughly 48 hours so divide by 2 that will take about 24 hours for my app to have searched for every single file name using 1 thread! Obviously this is too long so I want to narrow this down and speed the process up. What is the best way to go about doing this using multi-threading?

Currently I was thinking of making 20 separate threads and splitting my list (fileNames) into 20 different lists and search for the files simultaneously. For example I would have 20 different threads executing the below at the same time:

            foreach (string str in fileNames)
            {
                foreach (var file in Directory.GetFiles(folderToCheckForFileName, str, SearchOption.AllDirectories))
                {
                    string combinedPath = Path.Combine(newTargetDirectory, Path.GetFileName(file));
                    if (!File.Exists(combinedPath))
                    {
                        File.Copy(file, combinedPath);
                    }
                }
            }

UPDATED TO SHOW MY SOLUTION BELOW:

            string[] folderToCheckForFileNames = Directory.GetFiles("C:\\Users\\Alex\\Desktop\\ok", "*.jpg", SearchOption.AllDirectories);

            foreach(string str in fileNames)
            {
                Parallel.ForEach(folderToCheckForFileNames, currentFile =>
                  {
                      string filename = Path.GetFileName(currentFile);
                      if (str == filename)
                      {
                          string combinedPath = Path.Combine(targetDir, filename);
                          if (!File.Exists(combinedPath))
                          {
                              File.Copy(currentFile, combinedPath);
                              Console.WriteLine("FOUND A MATCH AND COPIED" + currentFile);
                          }
                      }

                  }
                );

            }

Thank you everyone for your contributions! Greatly Appreciated!

2

There are 2 answers

2
Cizaphil On BEST ANSWER

Instead of using ordinary foreach statement in doing your search, you should use parallel linq. Parallel linq combines the simplicity and readability of LINQ syntax with the power of parallel programming. Just like code that targets the Task Parallel Library. This will shield you from low level thread manipulation and probable exceptions (hard to find/debug exceptions) while splitting your work among many threads. So you might do something like this:

fileNames.AsParallel().ForAll(str =>
            {
                var files = Directory.GetFiles(folderToCheckForFileName, str, SearchOption.AllDirectories);
                files.AsParallel().ForAll(file =>
                {
                    if (!string.IsNullOrEmpty(file))
                    {
                        string combinedPath = Path.Combine(newTargetDirectory, Path.GetFileName(file));
                        if (!File.Exists(combinedPath))
                        {
                            File.Copy(file, combinedPath);
                        }
                    }
                });
            });
2
James Ko On

20 different threads won't help if your computer has fewer than 20 cores. In fact, it can make the process slower because you will 1) have to spend time context switching between each thread (which is your CPU's way of emulating more than 1 thread / core) and 2) a Thread in .NET reserves 1 MB for its stack, which is pretty hefty.

Instead, try dividing your I/O into async workloads, using Task.Run for the CPU-bound / intensive parts. Also, keep your number of Tasks to maybe 4 to 8 at the max.

Sample code:

var tasks = new Task[8];
var names = fileNames.ToArray();
for (int i = 0; i < tasks.Length; i++)
{
    int index = i;
    tasks[i] = Task.Run(() =>
    {
        for (int current = index; current < names.Length; current += 8)
        {
            // execute the workload
            string str = names[current];
            foreach (var file in Directory.GetFiles(folderToCheckForFileName, str, SearchOption.AllDirectories))
            {
                string combinedPath = Path.Combine(newTargetDirectory, Path.GetFileName(file));
                if (!File.Exists(combinedPath))
                {
                    File.Copy(file, combinedPath);
                }
            }
        }
    });
}
Task.WaitAll(tasks);