I'm using org.apache.commons.math3.distribution.NormalDistribution
in a large distributed Scala & Akka application. During debugging I found sample()
was occasionally returning NaN, which propagated silently and caused threads to hang in org.apache.commons.math3.ode.nonstiff.DormandPrince853Integrator
The NaN can be reproduced simply with parallel colelctions (doesn't happen in sequential code):
val normal = new NormalDistribution(0,0.1)
(1 to 1000000000).par.foreach{i =>
val r = normal.sample
if(r.isNaN()) throw new Exception("r = "+r)
}
Obviously moving the val normal
inside the foreach
solves the issue in this case.
I've looked at the docs but can't see anything warning me of such issues. Have I failed to grasp a more fundamental concept about thread safety? Needless to say I'm now checking for NaN.
By digging through sources you can find that this constructor uses
Well19937c
random generator, which doesn't look thread-safe by itself at the first glance.You can make it thread safe, by explicitly setting the number generator to
SynchronizedRandomGenerator
which wraps any other random number generator (likeWell19937c
orMersenne Twister
). Note that by synchronizing access to random number generator withSynchronizedRandomGenerator
you'll lose all potential performance benefits and the 'parallel' version will be probably slower than a sequential one because of the synchronization. On the other hand, re-initializing the random distribution on every iteration in parallel will probably re-seed the PRNG multiple times with similar values based on current time, so your results will be skewed.A very general rule of the thumb (and if I'm wrong here, please correct me) is that 99% of the time, unless explicitly stated otherwise, you should probably stick to sequential execution when doing anything that relies on random number generation, as usually PRNGs will store state that might get corrupted when calling them from multiple threads. And unless you're doing expensive computations afterwards, the synchronization (in case of thread-safe stateful PRNGs) will be a bottleneck.