What is the low-level difference among using:
ForkJoinPool = new ForkJoinPool(X);
and
ExecutorService ex = Executors.newWorkStealingPool(X);
Where X is the desired level of parallelism, i.e. threads running.
According to the docs, I found them similar. Also, tell me which one is more appropriate and safe under any normal uses.
I have 130 million entries to write into a BufferedWriter and Sort them using Unix sort by the 1st column.
Also let me know how many threads to keep if possible.
Note: My System has 8 core processors and 32 GB RAM.
Work stealing is a technique used by modern thread-pools in order to decrease contention on the work queue.
A classical threadpool has one queue, and each thread-pool-thread locks the queue, dequeue a task and then unlocks the queue. If the tasks are short and there are many of them, there is a lot of contention on the queue. Using a lock-free queue really helps here, but doesn't solve the problem entirely.
Modern thread pools use work stealing - each thread has its own queue. When a threadpool thread produces a task - it enqueues it to his own queue. When a threadpool thread wants to dequeue a task - it first tries to dequeue a task out of his own queue and if it doesn't have any - it "steals" work from other thread queues. This really decreases the contention of the threadpool and improves performance.
newWorkStealingPoolcreates a workstealing-utilizing thread pool with the number of threads as the number of processors.newWorkStealingPoolpresents a new problem. If I have four logical cores, then the pool will have four threads total. If my tasks block - for example on synchronous IO - I don't utilize my CPUs enough. What I want is four active threads at any given moment, for example - four threads which encrypt AES and another 140 threads which wait for the IO to finish.This is what
ForkJoinPoolprovides - if your task spawns new tasks and that task waits for them to finish - the pool will inject new active threads in order to saturate the CPU. It is worth mentioning thatForkJoinPoolutilizes work stealing too.Which one to use? If you work with the fork-join model or you know your tasks block indefinitely, use the
ForkJoinPool. If your tasks are short and are mostly CPU-bound, usenewWorkStealingPool.And after anything has being said, modern applications tend to use thread pool with the number of processors available and utilize asynchronous IO and lock-free-containers in order to prevent blocking. this (usually) gives the best performance.