I'm profiling a multithreaded program running with different numbers of allowed threads. Here are the performance results of three runs of the same input work.
1 thread:
Total thread time: 60 minutes.
Total wall clock time: 60 minutes.
10 threads:
Total thread time: 80 minutes. (Worked 33% longer)
Total wall clock time: 18 minutes. 3.3 times speed up
20 threads
Total thread time: 120 minutes. (Worked 100% longer)
Total wall clock time: 12 minutes. 5 times speed up
Since it takes more thread time to do the same work, I feel the threads must be contending for resources.
I've already examined the four pillars (cpu, memory, diskIO, network) on both the app machine and the database server. Memory was the original contended resource, but that's fixed now (more than 1G free at all times). CPU hovers between 30% and 70% on the 20 thread test, so plenty there. diskIO is practically none on the app machine, and minimal on the database server. The network is really great.
I've also code-profiled with redgate and see no methods waiting on locks. It helps that the threads are not sharing instances. Now I'm checking more nuanced items like database connection establishing/pooling (if 20 threads attempt to connect to the same database, do they have to wait on each other?).
I'm trying identify and address the resource contention, so that the 20 thread run would look like this:
20 threads
Total thread time: 60 minutes. (Worked 0% longer)
Total wall clock time: 6 minutes. 10 times speed up
What are the most likely sources (other than the big 4) that I should be looking at to find that contention?
The code that each thread performs is approximately:
Run ~50 compiled LinqToSql queries
Run ILOG Rules
Call WCF Service which runs ~50 compiled LinqToSql queries, returns some data
Run more ILOG Rules
Call another WCF service which uses devexpress to render a pdf, returns as binary data
Store pdf to network
Use LinqToSql to update/insert. DTC is involved: multiple databases, one server.
The WCF Services are running on the same machine and are stateless and able to handle multiple simultaneous requests.
Machine has 8 cpu's.
What you describe is that you want a scalability of a 100% that is a 1:1 relation between the increase in thread s and the decrease in wallcklock time... this is usally a goal but hard to reach...
For example you write that there is no memory contention because there is 1 GB free... this is IMHO a wrong assumption... memory contention means also that if two threads try to allocate memory it could happen that one has to wait for the other... another ponint to keep in mind are the interruptions happening by GC which freezes all threads temporarily... the GC can be customzed a bit via configuration (gcServer) - see http://blogs.msdn.com/b/clyon/archive/2004/09/08/226981.aspx
Another point is the WCF service called... if it can't scale up -for example the PDF rendering- then that is also a form of contention for example...
The list of possible contention is "endless"... and hardly always on the obvious areas you mentioned...
EDIT - as per comments:
Some points to check:
what provider do you use ? how is it configured ?
possible contention would be measured somewhere inside the library you use...
Check the execution plans for all these queries... it can be that some take any sort of lock and thus possibly create a contention DB-server-side...
EDIT 2:
Threads
Are these threads from the ThreadPool ? If so then you won't scale :-(
EDIT 3:
ThreadPool threads are bad for long-running tasks which is the case in your scenario... for details see
From http://www.yoda.arachsys.com/csharp/threads/printable.shtml
If you want extreme performance then it could be worth checking out CQRS and the real-world example described as LMAX .