I have put together a small test harness to diagnose why the throughput of my C# data processing application (its core function selects records in batches of 100 from a remote database server using non-blocking IO and performs simple processing on them) is much lower than it could be. I've observed that while running, the app encounters no bottlenecks in the way of CPU (<3%), network or disk IO, or RAM and does not stress the database server (the data set on the database is almost always entirely in RAM). If I run multiple instances of the app in parallel, I can get up to ~45 instances with only ~10% degradation in latency but with a 45x increase in throughput before CPU utilization on the database server becomes a bottleneck (at that point, there are still no resource bottlenecks on the client).
My question is why doesn't the TPL increase the number of Tasks in flight or otherwise increase throughput when the client server is capable of substantially higher throughput?
Simplified code excerpt:
public static async Task ProcessRecordsAsync()
{
int max = 10000;
var s = new Stopwatch();
s.Start();
Parallel.For(0, max, async x =>
{
await ProcessFunc();
});
s.Stop();
Console.WriteLine("{2} Selects completed in {0} ms ({1} per ms).", s.ElapsedMilliseconds, ((float)s.ElapsedMilliseconds) / max, max);
}
public static async Task ProcessFunc()
{
string sql = "select top 100 MyTestColumn from MyTestTable order by MyTestColumn desc;";
string connStr = "<blah>...";
using (SqlConnection conn = new SqlConnection(connStr))
{
try
{
conn.Open();
SqlCommand cmd = new SqlCommand(sql, conn);
DbDataReader rdr = await cmd.ExecuteReaderAsync();
while (rdr.Read())
{
// do simple processing here
}
rdr.Close();
}
catch (Exception ex)
{
Console.WriteLine(ex.ToString());
}
}
}
Parallel For doesn't try to choke the life out of your processor and maximize the number of concurrent threads doing work for you. It uses the number of cores as a starting point and may ramp up depending on the nature of the workload. See this question.
As it happens, you actually do have blocking IO...when opening the connection and reading rows. You might try this instead: