Virtual threads slower than single threads

440 views Asked by At

I have a a number of tasks where work can be done in parallel. I have tried this single-threaded and multi-threaded. Code is as follows:

private MyTaskResult doTasks( ... ) {
  
    try ( var scope = new StructuredTaskScope.ShutdownOnFailure() ) {
  
      var negTask =
        MyTask.create(
          ...
        );
  
      var posTask =
        MyTask.create(
          ...
        );
  
      var negTaskFuture = (
        CoreConfig.useMultipleThreads() ?
        scope.fork( negTask::task ) :
        new ImmediateFuture<>( negTask.task() )
      );
      
      var posTaskFuture = (
        CoreConfig.useMultipleThreads() ?
        scope.fork( posTask::task) :
        new ImmediateFuture<>( posTask.task() )
      );
  
      scope.join();
      scope.throwIfFailed();
  
      return new MyTaskResult(
        negTaskFuture.resultNow(),
        posTaskFuture.resultNow()
      );
  
    }
    catch ( InterruptedException | ExecutionException ex ) {
      throw new Error( ex );
    }
    
  }

Timings for single threaded execution are typically around 3ms for each of the two tasks, with some overhead, for 7ms total. Timings for multi-threaded execution are 9ms each task with some overhead for 11ms total. There is some inconsistency in the timings, with one task sometimes taking as little as 3ms and the other taking the usual 9ms. The task objects are created with their own data items (strictly immutable) and do not interact with other sources of data. The tasks do some calculations and create a data structure accordingly.

My expectation was that single-threaded execution time of 7ms elapsed would be accelerated to 4/5ms elapsed. This example is not important but it is not the only place where I am doing this work. I have similar results from other blocks of code. I did not present those examples because they reference a central data repository and all the spawned threads must contend for read and write access (I have used ReentrantReadWriteLock), so in that case there could be blocking. Even so the results are similar to the case I have presented above, with muti-threading taking 2-3x as long, albeit with larger number of threads. Note synchronized is not used.

I must be doing something dumb. Please help ?

Environment is IntelliJ, latest version.

1

There are 1 answers

0
ljm599 On

I did eventually solve this problem. It took some time because the program is moderately complex and a lot of re-writing and experimentation was required. Contention accessing a large shared object central to the calculation was the cause in some of the cases. Contention appears to approach 100% and access to this object appears to have been the limiting factor.

Removal of the contention then revealed a set of [less] contended logs and timers which were also reducing performance. Removal of contention on these objects then resulted in the expected performance improvements.

The lesson I took from this is that in that achieving performance improvements through concurrency is not straightforward in cases with significant shared resources.

In my case a partial solution required a complete re-write to remove all contention and store the results of the calculations run in each thread, to be later merged into the shared central resource. This merging had additional costs because the threads run similar calculations in parallel resulting in some data duplication that had to be dealt with.