What exactly makes Java Virtual Threads better

6.7k views Asked by At

I am pretty hyped for Project Loom, but there is one thing that I can't fully understand.

Most Java servers use thread pools with a certain limit of threads (200, 300 ..), however, you are not limited by the OS to spawn many more, I've read that with special configurations for Linux you can reach huge numbers.

OS threads are more expensive and they are slower to start/stop, have to deal with context switching (magnified by their number) and you are dependent on the OS which might refuse to give you more threads.

Having said that virtual threads also consume similar amounts of memory (or at least that is what I understood). With Loom we get tail-call optimizations which should reduce memory usage. Also, synchronization and thread context copy should still be a problem of a similar size.

Indeed you are able to spawn millions of Virtual Threads

public static void main(String[] args) {
    for (int i = 0; i < 1_000_000; i++) {
        Thread.startVirtualThread(() -> {
            try {
                Thread.sleep(1000);
            } catch (Exception e) {
                e.printStackTrace();
            }
        });
    }
}

the code above breaks at around 25k with an OOM exception when I use Platform threads.

My question is what exactly makes these threads so light, what is preventing us from spawning 1 million platform threads and working with them, is it only the context switching that makes the regular threads so "heavy".

One very similar question

Things I found so far:

  • Context Switching is expensive. Generally speaking even in the ideal case where the OS knows how the threads would behave it will still have to give each thread an equal chance to execute, given they have the same priority. If we spawn 10k OS threads it will have to constantly switch between them and this task alone can occupy up to 80% of the CPU time in some cases, so we have to be very careful with the numbers. With Virtual Threads, context switching is done by the JVM which makes it basically free
  • Cheap start/stop. When we interrupt a thread we essentially tell the task, "Kill the OS thread you are running on". However if for example, that thread is in a thread pool, by the time we are asking, the thread might be released by the current task and then given to another and the other task might get the interruption signal. This makes the interruption process quite complex. Virtual Threads are simply objects that live in the heap, we can just let the GC collect them in the background
  • Hard upper limits (tens of thousands at most) of threads, due to how the OS handles them. The OS can’t be fine-tuned to the specific applications and programming language so it has to prepare for the worst-case scenario memory-wise. It has to allocate more memory that will actually be used to accommodate all needs. While doing all of this it has to ensure that the vital OS processes are still working. With VT you are only limited by the memory which is cheap
  • Thread that performs a transaction behaves very differently than a Thread that does video processing, again the OS has to prepare for the worst-case scenario and accommodate both cases the best way it can, which means we get suboptimal performance in most cases. Since VT are spawned and managed by Java itself, this allows for complete control over them and task-specific optimizations that are not bound to the OS
  • Resizable stack. The OS gives Threads a big stack to fit all use cases, Virtual Threads have a resizable stack that lives in the heap space, it is dynamically resized to fit the problem which makes it smaller
  • Smaller metadata size. Platform threads use 1MB as mentioned above, whereas Virtual Threads need 200-300 bytes to store their metadata
4

There are 4 answers

0
raiks On

Sometimes people have to build systems able to handle an enormous number of simultaneous clients. Native threads are inadequate means for doing that due to RAM consumption and context switching costs.

Virtual threads give us an ability to run millions of I/O bound tasks simultaneously without changing our mental model.

That's why Golang made its way into the industry (besides Google support). Goroutines are a concept very similar to Java's virtual threads and they solve the same problem.

There are other ways to achieve what virtual thread do (such as NIO and the related Reactor pattern). This, however, entails using message loops and callbacks which warp your mind (that's why so many people hate JavaScript). There are layers of abstractions on top of them making things a bit easier but they also have a cost.

5
gbburkhardt On

Sure wish folks would state which OS they're talking about. I strongly suspect that Java threads have a performance advantage on Windows, but not on Linux.

0
pveentjer On

One big advantage of coroutines (so virtual threads) is that they can generate high levels of concurrency without the drawback of callbacks.

let me first introduce Little's Law:

concurrency = arrival_rate * latency

And we can rewrite this to:

arrival_rate = concurrency/latency

In a stable system, the arrival rate equals throughput.

throughput = concurrency/latency

To increase throughput, you have 2 options:

  1. decrease latency; which typically is very hard since you have little influence on how much time a remote call or a request to disk takes.
  2. increase concurrency

With regular threads, it is difficult to reach high levels of concurrency with blocking calls due to context switch overhead. Requests can be issued asynchronously in some cases (e.g. NIO + Epoll or Netty io_uring binding), but then you need to deal with callbacks and callback hell.

With a virtual thread, the request can be issued asynchronously and park the virtual thread and schedule another virtual thread. Once the response is received, the virtual thread is rescheduled and this is done completely transparently. The programming model is much more intuitive than using classic threads with callbacks.

1
Lunatic On

Virtual threads are wrapped upon platform threads, so you may consider them an illusion that JVM provides, the whole idea is to make lifecycle of threads to CPU bound operations.

What exactly makes Java Virtual Threads better ?

Virtual threads advantages

  • exhibits exact the same behavior as platform threads.
  • disposable and can be scaled to millions.
  • much more lightweight than platform threads.
  • fast creation time, as fast as creating string object.
  • the JVM does delimited continuation on IO operations, no IO for virtual threads.
  • yet can have the sequential code as previous but way more effective.
  • the JVM gives an illusion of virtual threads, underneath whole story goes on platform threads.
  • Just with usage of virtual thread CPU core become much more concurrent, the combination of virtual threads and multi core CPU with ComputableFutures to parallelized code is very powerful

Virtual threads usage cautions

  • Don not use monitor i.e the synchronized block, however this will fix in new release of JDK's, an alternative to do so is to use 'ReentrantLock' with try-final statement.

  • Blocking with native frames on stack, JNI's. its very rare

  • Control memory per stack (reduce thread locales and no deep recursion)

  • Monitoring tools not updated yet like debuggers, JConsole, VisualVM etc

  • Platform Threads versus Virtual threads. Platform threads take OS threads hostage in IO based tasks and operations limited to number of applicable threads with in thread pool and OS threads, by default they are non Daemon threads

  • Virtual threads are implemented with JVM, in CPU bound operations the associated to platform threads and retuning them to thread pool, after IO bound operation finished a new thread will be called from thread pool, so no hostage in this case.

Fourth level architecture to have better understanding.

enter image description here

CPU

  • Multicore CPU multicores with in cpu executing operations.

OS

  • OS threads the OS scheduler allocating cpu time to engaged OS threads.

JVM

  • platform threads are wrapped totally upon OS threads with both task operations
  • virtual threads are associated to platform threads in each CPU bound operation, each virtual thread can be associated with multiple platform threads as different times.

Virtual threads with Executer service

  • More effective to use executer service cause it associated to thread pool an limited to applicable threads with it, however in compare of virtual threads, with Executer service and virtual contained we do not ned to handle or manage the associated thread pool.

     try(ExecutorService service = Executors.newVirtualThreadPerTaskExecutor()) {
         service.submit(ExecutorServiceVirtualThread::taskOne);
         service.submit(ExecutorServiceVirtualThread::taskTwo);
     }
    
  • Executer service implements Auto Closable interface in JDK 19, thus when used with in 'try with resource', once it reach to end of 'try' block the 'close' api being called, alternatively main thread will wait till all submitted task with their dedicated virtual threads finish their lifecycle and associated thread pool being shutdown.

     ThreadFactory factory = Thread.ofVirtual().name("user thread-", 0).factory();
     try(ExecutorService service = Executors.newThreadPerTaskExecutor(factory)) {
         service.submit(ExecutorServiceThreadFactory::taskOne);
         service.submit(ExecutorServiceThreadFactory::taskTwo);
     }
    
  • Executer service can be created with virtual thread factory as well, just putting thread factory with it constructor argument.

  • Can benefits features of Executer service like Future and Completable Future.

Find more on JEP-425