Currently I have a .txt file of about 170,000 jpg file names and I read them all into a List (fileNames).
I want to search ONE folder (this folder has sub-folders) to check if each file in fileNames exists in this folder and if it does, copy it to a new folder.
I was making a rough estimate but each search and copy for each file name in fileNames takes about .5 seconds. So 170,000 seconds is roughly 48 hours so divide by 2 that will take about 24 hours for my app to have searched for every single file name using 1 thread! Obviously this is too long so I want to narrow this down and speed the process up. What is the best way to go about doing this using multi-threading?
Currently I was thinking of making 20 separate threads and splitting my list (fileNames) into 20 different lists and search for the files simultaneously. For example I would have 20 different threads executing the below at the same time:
foreach (string str in fileNames)
{
foreach (var file in Directory.GetFiles(folderToCheckForFileName, str, SearchOption.AllDirectories))
{
string combinedPath = Path.Combine(newTargetDirectory, Path.GetFileName(file));
if (!File.Exists(combinedPath))
{
File.Copy(file, combinedPath);
}
}
}
UPDATED TO SHOW MY SOLUTION BELOW:
string[] folderToCheckForFileNames = Directory.GetFiles("C:\\Users\\Alex\\Desktop\\ok", "*.jpg", SearchOption.AllDirectories);
foreach(string str in fileNames)
{
Parallel.ForEach(folderToCheckForFileNames, currentFile =>
{
string filename = Path.GetFileName(currentFile);
if (str == filename)
{
string combinedPath = Path.Combine(targetDir, filename);
if (!File.Exists(combinedPath))
{
File.Copy(currentFile, combinedPath);
Console.WriteLine("FOUND A MATCH AND COPIED" + currentFile);
}
}
}
);
}
Thank you everyone for your contributions! Greatly Appreciated!
Instead of using ordinary foreach statement in doing your search, you should use parallel linq. Parallel linq combines the simplicity and readability of LINQ syntax with the power of parallel programming. Just like code that targets the Task Parallel Library. This will shield you from low level thread manipulation and probable exceptions (hard to find/debug exceptions) while splitting your work among many threads. So you might do something like this: