high data transfer cost in Matlab parallel computing

47 views Asked by At

In my previous question, we discussed the use of parpool('Processes'). In such a parpool setting, if we perform a cheap operation, parfor can become badly slow due to the costs of transferring data between processes. This can be demonstrated using the ticBytes(gcp) function.

Based on the suggestions of @Cris Luengo and @Edric, I switched to using parpool('Threads'). It is expected that for large-scale cases, parfor should be faster than for. I have tested this using various scales ranging from 1e3 to 1e9, and the code is shown below.

len = 8e8;
A = rand(len, 1);
sum1 = 0;
sum2 = 0;

fprintf("Using for loop: ");
tic
for i = 1:len
    sum1 = A(i);
end
toc

fprintf("Using parfor loop: ");
tic
parfor i = 1:len
    sum2 = A(i);
end
toc

I have made some interesting observations and formulated the following statements:

  1. When the scale is small, for example 1e3, parfor is slower than for, because some inner parallel scheduling is executed. This cost is larger than the cost of the "plus" operation.

  2. When the scale is very large, for example 1e9, parfor is still much slower than for, because the RAM of my computer (16GB) is not sufficient for storing such a huge matrix and reading it from RAM. This can be demostrated by the Windows task manager.

  3. When the scale is appropriate, for example 6e8 or 7e8, parfor is faster than for. I hope this is not a statistical error.

The results are out, I have average my computations under every scale for 10 times. It seems that, parfor is always slower than for. The figure below is the times costs comparison. The x-label is scale of matrix, and the y-label is time costs.

enter image description here

enter image description here

0

There are 0 answers