I was attempting to use worker threads to speed up a larger algorithm when I noticed that using independent priority queue's on more threads actually slowed performance down. So I wrote a small test case.
In which I query how many threads to start, set each thread to its own processor, and the push and pop a lot of stuff from my priority queues. Each thread owns it's own priority queue, and they're allocated separately so I don't suspect false sharing.
I put the test case here, because it's longer than a snippet. (The processor affinity bit comes from NCrunch)
The priority queue is of my own creation because .NET didn't have a built in queue. It uses a Pairing Heap if that makes any difference.
At any rate if I run the program with one thread and one core, it gets about 100% usage. Usage drops with two threads / two cores And eventually pittles down to 30% usage with all 8 cores.
Which is a problem because the drop in performance nullifies any would be gain from multithreading. What's causing the drop in performance? Each queue is completely independent of the other thread's
Some problems like solving pi are more suited to parallelization and hyperthreading can actually give you a speed up. When you are dealing with a heavy memory problem like you are, Hyperthreading can't help and can actually hurt. Check out "pipelining" in CPU architecture.
There are not many practical problems for which you can get a 2x speedup by using 2-cpus. The more cpus, the more overhead. In your test case algorithm, I suspect cores are having to wait for the memory subsystem. If you tweak the memory requirements, you will see an increase in performance (and utilization) as you move the memory requirements closer to the CPU cache-size.