I'm trying to redistribute an array (mesh-like) over a set of processes for load-balancing needs. My special requirement is that array elements should only be moved to the spatially adjacent processes as only the elements near the front between elements can be moved easily.
In the above example setup, all first three processes should donate elements to the last one:
# Process, Neighbors, Nbr. of Elements to move in/out
0, (1 2), -23
1, (0 3), -32
2, (0 3), -13
3, (1 2), +68
Currently, I'm planning to implement this with blocking Two-way MPI comms where transactions happen similarly to the following:
P0 sends 23 elements to P1
P1 sends 55 elements to P3 (Should only send 32 originally, + the 23 it got from P0)
P2 sends 13 elements to P3
So, I was wondering if there is a known (easily-parallelized through Two-way MPI comms preferably) algorithm to deal with this kind of situations.
Also, I've thought about "flatting out" the processes, and considering they form a simple ring. This simplifies things but has the potential of being noisy and may not scale well:
P0 sends 23 elements to P1
P1 sends 55 elements to P2 (Even though it's not one of its spacial neighbors)
P2 sends 68 elements to P3
Can Metis/ParMetis library handle this?
I'll generalize your question: you are looking for an algorithm for load balancing where processes are connected through a graph, and can only move load to graph-connected processes. This algorithm exists: it's known as "diffusion based load balancing" and it was originally proposed by Cybenko. A simple web search will give you a ton of references.