I am running a Monte carlo simulation in Matlab using parallelisation due to the extensive time that the simulation takes to run.
The main objective is create a really big panel data set and use that to estimate some regressions.
The problem is that when I run the simulation without parallelise they take A LOT of time to run, so I decided to use spmd option. However, results are very different running the parallelised code compared to the normal one.
rng(3857);
for r=1:MCREP
Ycom=[];
Xcom=[];
YLcom=[];
spmd
for it=labindex:numlabs:NT
(code to generate different components, alpha, delta, x_it, eps_it)
%e.g. x_it=2+1*randn(TT,1);
(uses random number generator: rndn)
% Create different time periods observations for each individual
for t=2:TT
yi(t)=xi*alpha+mu*delta+rho*yi(t-1)+beta*x_it(t)+eps_it(t);
yLi(t)=yi(t-1);
end
% Concatenate each individual in a big matrix: create panel
Ycom=[Ycom yi];
Xcom=[Xcom x_it];
YLcom=[YLcom yLi];
end
end
% Retrieve data stored in composite form
mm=matlabpool('size');
for i=1:mm
Y(:,(i-1)*(NT/mm)+1:i*(NT/mm))=Ycom{i};
X(:,(i-1)*(NT/mm)+1:i*(NT/mm))=Xcom{i};
YL(:,(i-1)*(NT/mm)+1:i*(NT/mm))=YLcom{i};
end
(rest of the code, run regressions)
end
The intensive part of the code is the one that is parallelised with the spmd, it creates a really large panel data set in where columns are independent individuals, and rows are dependent time periods.
My main problem is that when I run the code using the parallel then results are different than when I don't use it, moreover results are different if I use 8 workers or 16 workers. However for a matter of time is unfeasible to run the code without parallelisation.
I believe problem is coming from the random numbers generation, but I can not fix the seed inside the spmd because that mean fixing the seed inside the Monte Carlo loop, so all the repetitions are going to have the same numbers.
I would want to know how can I fix the random number generator in such a way that it does not matter how many workers I use it will give me the same results.
PS. Another solution would be to do the spmd in the most outer loop (the Monte Carlo loop), however I can not see a performance gain when I use the parallelisation in that way.
Thank you very much for your help.
Heh... the random generators in MATLAB's parallel execution is indeed an issue.
The MATLAB's web page about random generators (http://www.mathworks.com/help/matlab/math/creating-and-controlling-a-random-number-stream.html) states that only two streams/generators can have multiple streams. These two have a limited period (see the Table at the previous link).
BUT!!! The default generator (
mt19937ar
) can be seeded in order to have different results :)Thus, what you can do is to start with the
mrg32k3a
, obtain a random number in each worker and then use this random number along with the worker index to seed anmt19937ar
generator.E.g.
Of course, the
r1
andr2
can be modified (or, maybe, add morer
's) in order to have more complicated seeding.