I am faced with the following problem:
I have a function called TrainModel that runs for a very long time on a single thread. When it finishes computing, it returns a function as an output argument, let's call it f. When I enquire the type of this f, Julia returns:
(generic function with 1 method)
(I am not sure of this last piece of information is useful to anyone reading this)
Now in a second step, I need to apply function f on a very large array of values. This is a step that I would like to parallelise. Having had started Julia with multiple processes, e.g.
julia -p 4
ideally, I would use:
pmap(f, my_values)
or perhaps:
aux = @parallel (hcat) for ii=1:100000000
f(my_values[ii])
end
Unfortunately, this doesn't work. Julia complains that the workers are not aware of the function f, i.e. I get a messsage:
ERROR: function f not defined on process 2
How can I make function f available to all workers? Obviously a "dirty" solution would be to run the time-consuming function TrainModel on all workers, like this perhaps:
@everywhere f = TrainModel( ... )
but this would be a waste of cpu when all I want is that just the result f is available to all workers.
Though I searched for posts with similar problems, so far I could not find an answer...
Thanks in advance! best,
N.
The approach to return the function seems elegant but unfortunately, unlike JavaScript, Julia does not resolve all the variables when creating the functions. Technically, your training function could produce the source code of the function with literal values for all the trained parameters. Then pass it to each of the worker processes, which can parse it in their environment to a callable function.
I suggest to return a data structure that contains all the information to produce the trained function: weights of an ANN, support vectors, decision rules ... Define a the "trained" function on the worker processes, such that it will utilized the trained parameters. You might want to have the ability of saving the results of the training to disk anyway, so that you can easily re-produce your computations.