I'm using scikit-learn and numpy and I want to set the global seed so that my work is reproducible.
Should I use numpy.random.seed
or random.seed
?
From the link in the comments, I understand that they are different, and that the numpy version is not thread-safe. I want to know specifically which one to use to create IPython notebooks for data analysis. Some of the algorithms from scikit-learn involve generating random numbers, and I want to be sure that the notebook shows the same results on every run.
That depends on whether in your code you are using numpy's random number generator or the one in
random
.The random number generators in
numpy.random
andrandom
have totally separate internal states, sonumpy.random.seed()
will not affect the random sequences produced byrandom.random()
, and likewiserandom.seed()
will not affectnumpy.random.randn()
etc. If you are using bothrandom
andnumpy.random
in your code then you will need to separately set the seeds for both.Update
Your question seems to be specifically about scikit-learn's random number generators. As far as I can tell, scikit-learn uses
numpy.random
throughout, so you should usenp.random.seed()
rather thanrandom.seed()
.One important caveat is that
np.random
is not threadsafe - if you set a global seed, then launch several subprocesses and generate random numbers within them usingnp.random
, each subprocess will inherit the RNG state from its parent, meaning that you will get identical random variates in each subprocess. The usual way around this problem is to pass a different seed (ornumpy.random.Random
instance) to each subprocess, such that each one has a separate local RNG state.Since some parts of scikit-learn can run in parallel using joblib, you will see that some classes and functions have an option to pass either a seed or an
np.random.RandomState
instance (e.g. therandom_state=
parameter tosklearn.decomposition.MiniBatchSparsePCA
). I tend to use a single global seed for a script, then generate new random seeds based on the global seed for any parallel functions.