For example, there are three threads.
- Thread 1 is assigned tasks 1, 2, and 3.
- Thread 2 is assigned tasks 4, 5, and 6.
- Thread 3 is assigned tasks 7, 8, and 9.
Task sizes are not uniform. The tasks assigned to a thread have very similar working sets, so the cache will be used efficiently when all these three tasks are executed by the same thread. I should also note that the tasks will run on a NUMA system that has four nodes. Each one of the four threads must be assigned to a node of the system.
My problem is about load balancing. For example, I want Cilk scheduler to assign task 9 to thread 1 if thread 1 finishes its tasks before the others and task 9 is not started.
All solutions are welcome including Cilk Plus, OpenMP, or other schedulers freely available on the web.
Update: The threads must be assigned to nodes of the NUMA system and memory locations used by these threads must be allocated on specific nodes. I have been successfully using libnuma
with OpenMP. However I was not able to find how to map threads to nodes using Cilk, TBB, etc. If it were possible to get thread id of a spawned worker in Cilk Plus, I would map it to a node using numa_run_on_node(nodeid)
.
For more information about scalability problems of Cilk on NUMA architectures: http://www.sciencedirect.com/science/article/pii/S0167739X03001845#
The correct way to do this in Cilk would be something like:
Remember that cilk_spawn is a suggestion to the scheduler that the code after the cilk_spawn can be stolen, not a requirement. When a cilk_spawn is executed, it pushes a notation on tail of the worker's deque that the continuation is available for stealing. Thieves always steal from the head of the deque, so you're guaranteed that some worker will steal the continuation of main() before they steal the continuation of task1_task2_task3(). But since a worker chooses which worker it will steal from randomly, there's no guarantee that the final continuation of main() will be stolen before work from task1_task2_task3().
Barry Tannenbaum
Intel Cilk Development