Python something resets my random seed

1.7k views Asked by At

My question is the exact opposite of this one.

This is an excerpt from my test file

f1 = open('seed1234','r')
f2 = open('seed7883','r')
s1 = eval(f1.read())
s2 = eval(f2.read())
f1.close()
f2.close()
####
test_sampler1.random_inst.setstate(s1)
out1 = test_sampler1.run()
self.assertEqual(out1,self.out1_regress) # this is fine and passes

test_sampler2.random_inst.setstate(s2)
out2 = test_sampler2.run()
self.assertEqual(out2,self.out2_regress) # this FAILS

Some info -

test_sampler1 and test_sampler2 are 2 object from a class that performs some stochastic sampling. The class has an attribute random_inst which is an object of type random.Random(). The file seed1234 contains a TestSampler's random_inst's state as returned by random.getstate() when it was given a seed of 1234 and you can guess what seed7883 is. What I did was I created a TestSampler in the terminal, gave it a random seed of 1234, acquired the state with rand_inst.getstate() and save it to a file. I then recreate the regression test and I always get the same output.

HOWEVER

The same procedure as above doesn't work for test_sampler2 - whatever I do not get the same random sequence of numbers. I am using python's random module and I am not importing it anywhere else, but I do use numpy in some places (but not numpy.random).

The only difference between test_sampler1 and test_sampler2 is that they are created from 2 different files. I know this is a big deal and it is totally dependent on the code I wrote but I also can't simply paste ~800 lines of code here, I am merely looking for some general idea of what I might be messing up...

What might be scrambling the state of test_sampler2's random number generator?

Solution

There were 2 separate issues with my code:

1

My script is a command line script and after I refactored it to use python's optparse library I found out that I was setting the seed for my sampler using something like seed = sys.argv[1] which meant that I was setting the seed to be a str, not an int - seed can take any hashable object and I found it the hard way. This explains why I would get 2 different sequences if I used the same seed - one if I run my script from the command line with sth like python sample 1234 #seed is 1234 and from my unit_tests.py file when I would create an object instance like test_sampler1 = TestSampler(seed=1234).

2

I have a function for discrete distribution sampling which I borrowed from here (look at the accepted answer). The code there was missing something fundamental: it was still non-deterministic in the sense that if you give it the same values and probabilities array, but transformed by a permutation (say values ['a','b'] and probs [0.1,0.9] and values ['b','a'] and probabilities [0.9,0.1]) and the seed is set and you will get the same random sample, say 0.3, by the PRNG, but since the intervals for your probabilities are different, in one case you'll get a b and in one an a. To fix it, I just zipped the values and probabilities together, sorted by probability and tadaa - I now always get the same probability intervals.

After fixing both issues the code worked as expected i.e. out2 started behaving deterministically.

1

There are 1 answers

4
Tim Peters On

The only thing (apart from an internal Python bug) that can change the state of a random.Random instance is calling methods on that instance. So the problem lies in something you haven't shown us. Here's a little test program:

from random import Random

r1 = Random()
r2 = Random()

for _ in range(100):
    r1.random()
for _ in range(200):
    r2.random()

r1state = r1.getstate()
r2state = r2.getstate()

with open("r1state", "w") as f:
    print >> f, r1state
with open("r2state", "w") as f:
    print >> f, r2state


for _ in range(100):
    with open("r1state") as f:
        r1.setstate(eval(f.read()))
    with open("r2state") as f:
        r2.setstate(eval(f.read()))
    assert r1state == r1.getstate()
    assert r2state == r2.getstate()

I haven't run that all day, but I bet I could and never see a failing assert ;-)

BTW, it's certainly more common to use pickle for this kind of thing, but it's not going to solve your real problem. The problem is not in getting or setting the state. The problem is that something you haven't yet found is calling methods on your random.Random instance(s).

While it's a major pain in the butt to do so, you could try adding print statements to random.py to find out what's doing it. There are cleverer ways to do that, but better to keep it dirt simple so that you don't end up actually debugging the debugging code.