Fixing set.seed for an entire session

45.9k views Asked by At

I am using R to construct an agent based model with a monte carlo process. This means I got many functions that use a random engine of some kind. In order to get reproducible results, I must fix the seed. But, as far as I understand, I must set the seed before every random draw or sample. This is a real pain in the neck. Is there a way to fix the seed?

set.seed(123)
print(sample(1:10,3))
# [1] 3 8 4
print(sample(1:10,3))
# [1]  9 10  1
set.seed(123)
print(sample(1:10,3))
# [1] 3 8 4
6

There are 6 answers

3
Gavin Simpson On BEST ANSWER

There are several options, depending on your exact needs. I suspect the first option, the simplest is not sufficient, but my second and third options may be more appropriate, with the third option the most automatable.

Option 1

If you know in advance that the function using/creating random numbers will always draw the same number, and you don't reorder the function calls or insert a new call in between existing ones, then all you need do is set the seed once. Indeed, you probably don't want to keep resetting the seed as you'll just keep on getting the same set of random numbers for each function call.

For example:

> set.seed(1)
> sample(10)
 [1]  3  4  5  7  2  8  9  6 10  1
> sample(10)
 [1]  3  2  6 10  5  7  8  4  1  9
> 
> ## second time round
> set.seed(1)
> sample(10)
 [1]  3  4  5  7  2  8  9  6 10  1
> sample(10)
 [1]  3  2  6 10  5  7  8  4  1  9

Option 2

If you really want to make sure that a function uses the same seed and you only want to set it once, pass the seed as an argument:

foo <- function(...., seed) {
  ## set the seed
  if (!missing(seed)) 
    set.seed(seed) 
  ## do other stuff
  ....
}

my.seed <- 42
bar <- foo(...., seed = my.seed)
fbar <- foo(...., seed = my.seed)

(where .... means other args to your function; this is pseudo code).

Option 3

If you want to automate this even more, then you could abuse the options mechanism, which is fine if you are just doing this in a script (for a package you should use your own options object). Then your function can look for this option. E.g.

foo <- function() {
  if (!is.null(seed <- getOption("myseed")))
    set.seed(seed)
  sample(10)
}

Then in use we have:

> getOption("myseed")
NULL
> foo()
 [1]  1  2  9  4  8  7 10  6  3  5
> foo()
 [1]  6  2  3  5  7  8  1  4 10  9
> options(myseed = 42)
> foo()
 [1] 10  9  3  6  4  8  5  1  2  7
> foo()
 [1] 10  9  3  6  4  8  5  1  2  7
> foo()
 [1] 10  9  3  6  4  8  5  1  2  7
> foo()
 [1] 10  9  3  6  4  8  5  1  2  7
1
TilmanHartley On

I think this question suffers from a confusion. In the example, the seed has been set for the entire session. However, this does not mean it will produce the same set of numbers every time you use the print(sample)) command during a run; that would not resemble a random process, as it would be entirely determinate that the same three numbers would appear every time. Instead, what actually happens is that once you have set the seed, every time you run a script the same seed is used to produce a pseudo-random selection of numbers, that is, numbers that look as if they are random but are in fact produced by a reproducible process using the seed you have set.

If you rerun the entire script from the beginning, you reproduce those numbers that look random but are not. So, in the example, the second time that the seed is set to 123, the output is again 9, 10, and 1 which is exactly what you'd expect to see because the process is starting again from the beginning. If you were to continue to reproduce your first run by writing print(sample(1:10,3)), then the second set of output would again be 3, 8, and 4.

So the short answer to the question is: if you want to set a seed to create a reproducible process then do what you have done and set the seed once; however, you should not set the seed before every random draw because that will start the pseudo-random process again from the beginning.

This question is old, but still comes high in search results, and it seemed worth expanding on Spacedman's answer.

5
Aaron left Stack Overflow On

No need. Although the results are different from sample to sample (which you almost certainly want, otherwise the randomness is very questionable), results from run to run will be the same. See, here's the output from my machine.

> set.seed(123)
> sample(1:10,3)
[1] 3 8 4
> sample(1:10,3)
[1]  9 10  1
0
hd1 On

You could do a wrapper function, like so:

> wrap.3.digit.sample <- function(x) {
+    set.seed(123)
+    return(sample(x, 3))
+ }
> wrap.3.digit.sample(c(1:10))
[1] 3 8 4
> wrap.3.digit.sample(c(1:10))
[1] 3 8 4

There is probably a more elegant way, and I'm sure someone will chime in with it. But, if they don't, this should make your life easier.

0
alittleboy On

I suggest that you set.seed before calling each random number generator in R. I think what you need is reproducibility for Monte Carlo simulations. If in a for loop, you can set.seed(i) before calling sample, which guarantees to be fully reproducible. In your outer function, you may specify an argument seed=1 so that in the for loop, you use set.seed(i+seed).

2
stevec On

If you want to always return the same results from random processes, simply keep the seed set all the time with:

addTaskCallback(function(...) {set.seed(123);TRUE})

Now the output is the same every time:

print(sample(1:10,3))
# [1] 3 8 4
print(sample(1:10,3))
# [1] 3 8 4