Parallel.ForEach loop is performing like a serial loop

578 views Asked by At

I've spent about 8+ hours searching online for help and I couldn't find anything so, here goes.

I'm working with Team Foundation Server and C# and I'm trying to acquire a list of work items and convert them into a generic object we made to be bound to a special UI. To work items are Tasks for a specific day and the list is about 30ish items in size, so not that big of a deal.

The loop looks like this:

List<IWorkItemData> workitems = new List<IWorkItemData>();
var queryForData = Store.Query(query).Cast<WorkItem>();

if (queryForData.Count() == 0)
    return workitems;

Parallel.ForEach(queryForData, (wi) =>
{
    var temp = wi;
    lock (workitems)
    {
        TFSWorkItemData tfsWorkItem = new TFSWorkItemData(temp);
        workitems.Add(tfsWorkItem);
    }
});

The inside of TFSWorkItemData's constuctor looks like this:

public TFSWorkItemData(WorkItem workItem)
{
    this.workItem = workItem;

    this.Fields = new Dictionary<string, IFieldData>();

    // Add Fields
    foreach (Field field in workItem.Fields)
    {
        TFSFieldData fieldData = new TFSFieldData
        {
            Value = field.Value,
            OldValue = field.OriginalValue,
            ReferenceName = field.ReferenceName,
            FriendlyName = field.Name,
            ValueType = field.FieldDefinition.SystemType
        };
        this.Fields.Add(field.ReferenceName, fieldData);
    }
}

So it takes about 90 seconds to perform this operation. I know it doesn't take that long to grab 30 work items so it must be something I'm doing to cause this to take so long. I know that the lock is a performance hit, but when I remove it I get a InvalidOperationException saying that the collection has been modified. When I look in the details of this exception, the only useful info I can find is that the collection is a dictionary. What's weird is that it doesn't look to me like the field dictionary in the work item is being modified at all. And the dictionary in my class is only being added to and that shouldn't be the culprit unless I'm missing something.

Please help me figure out what I'm doing wrong regarding the dictionaries. I did try moving the parallel foreach loop to the workitem.Fields collection but I can't seem to get that to work.

Edit: Read comments of Answer for answer to this question. Thank you.

2

There are 2 answers

2
L_Laxton On BEST ANSWER

I found a another way that might work for anyone trying to do something similar.

// collect all of the IDs
var itemIDs = Store.Query(query).Cast<WorkItem>().Select(wi = wi.Id).ToArray();

IWorkItemData[] workitems = new IWorkItemData[itemIDs.Length];

// then go through the list, get the complete workitem, create the wrapper,
// and finally add it to the collection
System.Threading.Tasks.Parallel.For(0, itemIDs.Length, i =>
{
    var workItem = Store.GetWorkItem(itemIDs[i]);
    var item = new TFSWorkItemData(workItem);
    workitems[i] = item;
});

Edit: changed list to array

5
Yuval Itzchakov On

Please help me figure out what I'm doing wrong regarding the dictionaries.

The exception is thrown because List<T> is not thread-safe.

You have a shared resource which needs to be modified, using Parallel.ForEach won't really help, as you're moving the bottleneck to the lock, causing the contention there, which is likely as to why you're seeing performance actually degrade. Threads aren't a magical solution. You need to aspire to have as many independent workers which can each do their own work.

Instead, you can try using PLINQ which will partition your enumerable internally. Since you actually want to project each element in the collection, you can use Enumerable.Select:

 var workItems = queryForData.AsParallel().Select(workItem => new TFSWorkItemData(workItem)).ToArray();

In order to find out if this solution is actually better than sequential iteration, benchmark your code. Never assume more threads are faster.