I'm experimenting with threads parking and decided to build some sort of service. Here is how it looks like:
public class TestService {
private static final Logger logger = LoggerFactory.getLogger(TestService.class); // logback I think this logger causes some troubles
private final CountDownLatch stopLatch;
private final Object parkBlocker = new Object();
private volatile boolean stopped;
private final Thread[] workers;
public TestService(int parallelizm) {
stopLatch = new CountDownLatch(parallelizm);
workers = new Thread[parallelizm];
for (int i = 0; i < parallelizm; i++) {
workers[i] = new Thread(() -> {
try {
while (!stopped) {
logger.debug("Parking " + Thread.currentThread().getName());
LockSupport.park(parkBlocker);
logger.debug(Thread.currentThread().getName() + " unparked");
}
} finally {
stopLatch.countDown();
}
});
}
}
public void start() {
Arrays.stream(workers).forEach(t -> {
t.start();
logger.debug(t.getName() + " started");
});
}
public boolean stop(long timeout, TimeUnit unit) throws InterruptedException {
boolean stoppedSuccefully = false;
this.stopped = true;
unparkWorkers();
if (stopLatch.await(timeout, unit)) {
stoppedSuccefully = true;
}
return stoppedSuccefully;
}
private void unparkWorkers() {
Arrays.stream(workers).forEach(w -> {
LockSupport.unpark(w);
logger.debug("Un-park call is done on " + w.getName());
});
}
}
The issue I faced with was that if I then test this service as follows:
public static void main(String[] args) = {
while(true) {
TestService service = new TestService(2);
service.start();
if (!service.stop(10000, TimeUnit.MILLISECONDS))
throw new RuntimeException();
}
}
I sometimes got the following behavior:
14:58:55.226 [main] DEBUG com.pack.age.TestService - Thread-648 started
14:58:55.227 [Thread-648] DEBUG com.pack.age.TestService - Parking Thread-648
14:58:55.227 [main] DEBUG com.pack.age.TestService - Thread-649 started
14:58:55.227 [main] DEBUG com.pack.age.TestService - Un-park call is done on Thread-648
14:58:55.227 [Thread-648] DEBUG com.pack.age.TestService - Thread-648 unparked
14:58:55.227 [main] DEBUG com.pack.age.TestService - Un-park call is done on Thread-649
14:58:55.227 [Thread-649] DEBUG com.pack.age.TestService - Parking Thread-649
Exception in thread "main" java.lang.RuntimeException
at com.pack.age.Test$.main(Test.scala:12)
at com.pack.age.Test.main(Test.scala)
The thread is hanging out on parking:
"Thread-649" #659 prio=5 os_prio=0 tid=0x00007efe4433f000 nid=0x7691 waiting on condition [0x00007efe211c8000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000720739a68> (a java.lang.Object)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at com.pack.age.TestService.lambda$new$0(TestService.java:27)
at com.pack.age.TestService$$Lambda$1/1327763628.run(Unknown Source)
at java.lang.Thread.run(Thread.java:748)
I don't see any race in park-unpark in the service. Moreover if the unpark
is called before park
, the park
is guaranteed no to block (that's what javadocs say).
Maybe I misuse LockSupport::park
. Can you suggest any fix?
This has nothing to do with logger, though it's usage brings the problem to the surface. You have a race condition, as simple as that. Before explaining that race condition you need to understand a few things from
LockSupport::unpark
documentation first:The first point is explain here. The short version is : if you have a
thread
that has already been started, but has not yet calledpark
, and within this period of time (between thestart
of the thread andpark
), some other thread callsunpark
on the first one : that thread will not park, at all. The permit will be available immediately. May be this little drawing will make it more clear:Notice how
ThreadB
callsunpark(ThreadA)
between the period whereThreadA
has calledstart
andpark
. As such, whenThreadA
reachespark
: it is guaranteed not to block, exactly like the documentation says.The second point from the same documentation is:
Let's see that via a drawing:
After
ThreadA
callspark
, it will hang forever, sinceThreadB
never callsunpark
on it again. Notice that the call tounpark
was made beforeThreadA
has started (unlike the previous example).And this is exactly what happens in your case:
LockSupport.unpark(w);
(fromunparkWorkers
) is called beforet.start();
frompublic void start(){...}
. In simpler words - your code callsunpark
on bothworkers
before they even start, as such when they ultimately reachpark
- they are stuck, no one is able tounpark
them. The fact that you see this with alogger
and not withSystem::out
has most probably to do with the face that when youprintln
- there is asynchronized
method under the hood.As a matter of fact,
LockSupport
offers exactly the semantics needed to prove this. For this we need (for simplicity :SOProblem service = new SOProblem(1);
)And now we need to insert this in proper methods. First flag that fact that we have called
unpark
:Then reset the flag after a cycle has ended:
Then change the constructor to flag that the thread has started:
Then, when your thread freezes you need to inspect the flag:
where
debug
method is:When the issue re-produces, you have called
unpark
before you calledpark
, that happens whenx = 3
as the output.