Generate random numbers inside spmd in matlab

128 views Asked by At

I am running a Monte carlo simulation in Matlab using parallelisation due to the extensive time that the simulation takes to run.

The main objective is create a really big panel data set and use that to estimate some regressions.

The problem is that when I run the simulation without parallelise they take A LOT of time to run, so I decided to use spmd option. However, results are very different running the parallelised code compared to the normal one.

rng(3857);
for r=1:MCREP
Ycom=[];
Xcom=[];
YLcom=[];

spmd
for it=labindex:numlabs:NT
    (code to generate different components, alpha, delta, x_it, eps_it)
    %e.g. x_it=2+1*randn(TT,1);   
    (uses random number generator: rndn)

    % Create different time periods observations for each individual
    for t=2:TT
        yi(t)=xi*alpha+mu*delta+rho*yi(t-1)+beta*x_it(t)+eps_it(t);
        yLi(t)=yi(t-1);
    end

    % Concatenate each individual in a big matrix: create panel
    Ycom=[Ycom yi];
    Xcom=[Xcom x_it];
    YLcom=[YLcom yLi];
end
end

% Retrieve data stored in composite form
mm=matlabpool('size');
for i=1:mm
Y(:,(i-1)*(NT/mm)+1:i*(NT/mm))=Ycom{i};
X(:,(i-1)*(NT/mm)+1:i*(NT/mm))=Xcom{i};
YL(:,(i-1)*(NT/mm)+1:i*(NT/mm))=YLcom{i};
end

(rest of the code, run regressions)

end

The intensive part of the code is the one that is parallelised with the spmd, it creates a really large panel data set in where columns are independent individuals, and rows are dependent time periods.

My main problem is that when I run the code using the parallel then results are different than when I don't use it, moreover results are different if I use 8 workers or 16 workers. However for a matter of time is unfeasible to run the code without parallelisation.

I believe problem is coming from the random numbers generation, but I can not fix the seed inside the spmd because that mean fixing the seed inside the Monte Carlo loop, so all the repetitions are going to have the same numbers.

I would want to know how can I fix the random number generator in such a way that it does not matter how many workers I use it will give me the same results.

PS. Another solution would be to do the spmd in the most outer loop (the Monte Carlo loop), however I can not see a performance gain when I use the parallelisation in that way.

Thank you very much for your help.

1

There are 1 answers

0
Xxxo On

Heh... the random generators in MATLAB's parallel execution is indeed an issue.

The MATLAB's web page about random generators (http://www.mathworks.com/help/matlab/math/creating-and-controlling-a-random-number-stream.html) states that only two streams/generators can have multiple streams. These two have a limited period (see the Table at the previous link).

BUT!!! The default generator (mt19937ar) can be seeded in order to have different results :)

Thus, what you can do is to start with the mrg32k3a, obtain a random number in each worker and then use this random number along with the worker index to seed an mt19937ar generator.

E.g.

spmd
  r1 = rand(randStream{labindex}, [1 1]);
  r2 = rand(randStream{labindex}, [1 1]);
  rng(labindex+(r1/r2), 'twister');

% Do you stuff
end

Of course, the r1 and r2 can be modified (or, maybe, add more r's) in order to have more complicated seeding.