How is this parallel for not processing all elements?

811 views Asked by At

I've created this normal for loop:

    public static Dictionary<string,Dictionary<string,bool>> AnalyzeFiles(IEnumerable<string> files, IEnumerable<string> dependencies)
    {
        Dictionary<string, Dictionary<string, bool>> filesAnalyzed = new Dictionary<string, Dictionary<string, bool>>();
        foreach (var item in files)
        {
            filesAnalyzed[item] = AnalyzeFile(item, dependencies);
        }
        return filesAnalyzed;
    }

The loop just checks if each file that is in the variable "files" has all the dependencies specified in the variable "dependencies".

the "files" variable should only have unique elements because it is used as the key for the result, a dictionary, but I check this before calling the method.

The for loop works correctly and all elements are processed in single thread, so I wanted to increase the performance by changing to a parallel for loop, the problem is that not all the elements that come from the "files" variable are being processed in the parallel for (in my test case I get 30 elements instead of 53).

I've tried to increase the timespan, or to remove all the "Monitor.TryEnter" code and use just a lock(filesAnalyzed) but still got the same result

I'm not very familiar with the paraller for, so it might be something in the syntax that I'm using.

    public static Dictionary<string,Dictionary<string,bool>> AnalyzeFiles(IEnumerable<string> files, IEnumerable<string> dependencies)
    {
        var filesAnalyzed = new Dictionary<string, Dictionary<string, bool>>();

        Parallel.For<KeyValuePair<string, Dictionary<string, bool>>>(
            //start index
            0,
            //end index
            files.Count(),
            // initialization?
            ()=>new KeyValuePair<string, Dictionary<string, bool>>(),
            (index, loop, result) =>
            {
                var temp = new KeyValuePair<string, Dictionary<string, bool>>(
                               files.ElementAt(index),
                               AnalyzeFile(files.ElementAt(index), dependencies));
                return temp;
            }
            ,
            //finally
            (x) =>
            {
                if (Monitor.TryEnter(filesAnalyzed, new TimeSpan(0, 0, 30)))
                {
                    try
                    {
                        filesAnalyzed.Add(x.Key, x.Value);
                    }
                    finally
                    {
                        Monitor.Exit(filesAnalyzed);
                    }
                }
            }
            );
        return filesAnalyzed;
    }

any feedback is appreciated

4

There are 4 answers

3
MikkaRin On BEST ANSWER

Rewrite your normal loop this way:

   Parallel.Foreach(files, item=>
    {
        filesAnalyzed[item] = AnalyzeFile(item, dependencies);
    });

You should also use ConcurrentDictionary except Dictionary to make all process thread-safe

1
Panagiotis Kanavos On

You can simplify your code a lot if you use Parallel LINQ instead :

public static Dictionary<string,Dictionary<string,bool>> AnalyzeFiles(IEnumerable<string> files, IEnumerable<string> dependencies)
{
    var filesAnalyzed = ( from item in files.AsParallel()
                          let result=AnalyzeFile(item, dependencies)
                          select (Item:item,Result:result)
                        ).ToDictionary( it=>it.Item,it=>it.Result)               
    return filesAnalyzed;
}

I used tuple syntax in this case to avoid noise. It also cuts down on allocations.

Using method syntax, the same can be written as :

var filesAnalyzed = files.AsParallel()
                         .Select(item=> (item, AnalyzeFile(item, dependencies)))
                         .ToDictionary( it=>it.Item,it=>it.Result)               

Dictionary<> isn't thread-safe for modification. If you wanted to use Parallel.ForEach without locking, you'd have to use ConcurrentDictionary

var filesAnalyzed = ConcurrentDictionary<string,Dictionary<string,bool>>;

Parallel.ForEach(files,file => {
    filesAnalyzed[item] = AnalyzeFile(item, dependencies);
});

In this case at least, there is no benefit in using Parallel over PLINQ.

5
Magnus On

Assuming the code inside AnalyzeFile and dependencies is thread safe, how about something like this:

var filesAnalyzed = files
    .AsParellel()
    .Select(x => new{Item = x, File = AnalyzeFile(x, dependencies)})
    .ToDictionary(x => x.Item, x=> x.File);
0
Dogu Arslan On

Hard to say what is exactly going wrong without debugging the code. Just looking at it though I would have used a ConcurrentDictionary for filesAnalyzed variable instead of a normal `Dictionary and get rid of the Monitor.

I would also check whether same key already exists in the dictionary filesAnalyzed, it could be that you are trying to add a kvp withthe key that is added to the dictionary already.