I am running a program called dnadist from PHYLIP (http://evolution.genetics.washington.edu/phylip/doc/dnadist.html). This creates a dna distance matrix from the number of sequences you input.
Currently, I want to create a matrix from 14,778 sequences. I am submitting this to run on my University's HPCC and based on my calculated estimate it will take 10 days to run.
I want to request more cores to speed up the time, but I am getting confused on if this is even possible to split up the algorithm running? Or does it have to run all on 1 core? My assumption is I would have to alter the algorithm itself to spilt up the matrix being produced and then concatenate it all back together. Is this correct to assume?
Yes, you can parallelize, that is the main point of using HPCC. Without reading your code is hard to answer. I assume you code would something like:
You can parallelize coding your function with the basic matrix calculation, using PARALLEL ECL command and running workunit in Thor (not in HThor).