Is there an efficient algorithm to redistribute a vector over processes based on how many elements each process has to give up or recieve?

147 views Asked by At

I'm trying to redistribute an array (mesh-like) over a set of processes for load-balancing needs. My special requirement is that array elements should only be moved to the spatially adjacent processes as only the elements near the front between elements can be moved easily.

enter image description here

In the above example setup, all first three processes should donate elements to the last one:

# Process, Neighbors, Nbr. of Elements to move in/out
0, (1 2), -23
1, (0 3), -32
2, (0 3), -13
3, (1 2), +68

Currently, I'm planning to implement this with blocking Two-way MPI comms where transactions happen similarly to the following:

P0 sends 23 elements to P1
P1 sends 55 elements to P3      (Should only send 32 originally, + the 23 it got from P0)
P2 sends 13 elements to P3

So, I was wondering if there is a known (easily-parallelized through Two-way MPI comms preferably) algorithm to deal with this kind of situations.

Also, I've thought about "flatting out" the processes, and considering they form a simple ring. This simplifies things but has the potential of being noisy and may not scale well:

P0 sends 23 elements to P1
P1 sends 55 elements to P2    (Even  though it's not one of its spacial neighbors)
P2 sends 68 elements to P3

Can Metis/ParMetis library handle this?

1

There are 1 answers

3
Victor Eijkhout On BEST ANSWER

I'll generalize your question: you are looking for an algorithm for load balancing where processes are connected through a graph, and can only move load to graph-connected processes. This algorithm exists: it's known as "diffusion based load balancing" and it was originally proposed by Cybenko. A simple web search will give you a ton of references.