Best way to set up embarrassingly parallel code for workstation and HPC

68 views Asked by At

I'm looking for the most versatile/easiest way to run an embarrassingly parallel code with changing inputs.

  • I want the final solution to be able to run on a workstation (no PBS available), but also on multiple nodes of an HPC (with PBS) with minimal modification.
  • There will be 100-1000 instances to run. Lets say runtime can be up to 8 hours.
  • Individual instances of the code do not need to communicate
  • A single NUMA node can and should run multiple instances as long as there is enough memory/cpu for the work. Most of the code is single threaded and this is not easy to change to be multi-threaded. There are threaded portions where BLAS is called.
  • Cores/memory dedicated to an instance should be pinned to NUMA node
  • The number of runs that can happen in parallel will likely be restricted by available memory. Based on the memory limited number that can run in parallel and the number of cores in NUMA local, the number of cores per instance would be calculated/assigned.

The code itself is fortran and takes command line inputs. I can modify the code if need be.

I'm wondering what the best approach to the above is. I want the versatility to run on a single/multiple workstations without PBS, and on an HPC with PBS.

Options in my mind

  1. Roll my own sh script (currently using this approach on workstation). I was using xargs in the script originally, but the string of commands was getting complex. I ended up backgrouding each run and using jobs in a while loop. It works pretty well. I don't see an easy way to scale this across multiple nodes in the future though.
  2. MPI. Implement in fortran directly or use python wrapper of some kind? It seems like, --map-by ppr:4:numa:pe=2 would start 4 processes per each numa node and bind each process to processing elements. Processes per numa local and threads per process could be calculated based on problem size and numa config. With compute divvied up, MPI would handle chugging through the jobs.
  3. Other?
0

There are 0 answers