I would like to compare different machine learning algorithms. As part of that, I need to be able to perform a grid search for optimal hyperparameters. However, I am not really into the idea of writing a separate implementation of a grid search for each fixed algorithm and a fixed subset of its hyperparameters. Instead, I would like it to look more like it does in scikit-learn but perhaps with not as much functionality (I do not need multiple grids, for example) and written in MATLAB.
So far I am trying to understand the logic of the yet to be written grid_search.m
function model = grid_search(algo, data, labels, varargin)
p = inputParser;
% here comes the list of all possible hyperparameters for all algorithms
% I will just leave three for brevity
addOptional(p, 'kernel_function', {'linear'});
addOptional(p, 'rbf_sigma', {1});
addOptional(p, 'C', {1});
parse(p, algo, data, labels, varargin{:});
names = fieldnames(p.Results);
values = struct2cell(p.Results); % a cell array of cell arrays
argsize = 2 * length(names);
args = cell(1, argsize);
args(1 : 2 : argsize) = names;
% Now this is the stumbling point.
end
The calls to the grid_search
function should look something like this:
m = grid_search('svm', data, labels, 'kernel_function', {'rbf'}, 'C', {[0.1], [1], [10]}, 'rbf_sigma', {[1], [2], [3]})
m = grid_search('knn', data, labels, 'NumNeighbors', {[1], [10]}, 'Distance', {'euclidean', 'cosine'})
The first call then would try all the combinations of the rbf kernel with Constraints and Sigmas:
{'rbf', 0.1, 1}
{'rbf', 0.1, 2}
{'rbf', 0.1, 3}
{'rbf', 1, 1}
{'rbf', 1, 2}
{'rbf', 1, 3}
{'rbf', 10, 1}
{'rbf', 10, 2}
{'rbf', 10, 3}
The idea behind the args
variable is that it is a cell array of the form {'name1', 'value1', 'name2', 'value2', ..., 'nameN', 'valueN'}
which would be later on passed to the corresponding algorithm: algo(data, labels, args{:})
. The {'name1', 'name2', ..., 'nameN'}
part of it is easy. The problem is that I can't unerstand how to create the {'value1', 'value2', ..., 'valueN'}
part on each step.
I understand that machine learning terminology is not known to everybody Which is why below comes a self-contained example:
Suppose the crew of the TARDIS may consist of the following classes of beings:
tardis_crew = {{'doctor'}, {'amy', 'clara'}, {'dalek', 'cyberman', 'master'}}
Since there is always just one place for a Timelord, a Companion and a Villain, please show me how to generate the following cell arrays:
{'Timelord', 'doctor', 'Companion', 'amy', 'Villain', 'dalek'}
{'Timelord', 'doctor', 'Companion', 'amy', 'Villain', 'cyberman'}
{'Timelord', 'doctor', 'Companion', 'amy', 'Villain', 'master'}
{'Timelord', 'doctor', 'Companion', 'clara', 'Villain', 'dalek'}
{'Timelord', 'doctor', 'Companion', 'clara', 'Villain', 'cyberman'}
{'Timelord', 'doctor', 'Companion', 'clara', 'Villain', 'master'}
The solution should be general, i.e. if the number of beings in a class changes or more classes of beings are added, it should still work. I would much appreciate a step-by-step descritption instead of code, too.
PS: The non-stripped github version of the original grid_search.m
might give you a better idea of what I mean.
It seems what you want is to generate the Cartesian product of an arbitrary number of sets. I think this ALLCOMB function will do that for you, but if you want details of an (iterative) algorithm so you can implement it yourself, check this answer.
Edit: Thank you by the way for providing a general phrasing for people without ML knowledge.