Given any image I want my classifier to tell if it is Sunflower or not. How can I go about creating the second class ? Keeping the set of all possible images - {Sunflower} in the second class is an overkill. Is there any research in this direction ? Currently my classifier uses a neural network in the final layer. I have based it upon the following tutorial :
https://github.com/torch/tutorials/tree/master/2_supervised
I am taking images with 254x254 as the input. Would SVM help in the final layer ? Also I am open to using any other classifier/features that might help me in this.
The standard approach in ML is that:
1) Build model 2) Try to train on some data with positive\negative examples (start with 50\50 of pos\neg in training set) 3) Validate it on test set (again, try 50\50 of pos\neg examples in test set) If results not fine: a) Try different model? b) Get more data
For case #b, when deciding which additional data you need the rule of thumb which works for me nicely would be: 1) If classifier gives lots of false positive (tells that this is a sunflower when it is actually not a sunflower at all) - get more negative examples 2) If classifier gives lots of false negative (tells that this is not a sunflower when it is actually a sunflower) - get more positive examples
Generally, start with some reasonable amount of data, check the results, if results on train set or test set are bad - get more data. Stop getting more data when you get the optimal results.
And another thing you need to consider, is if your results with current data and current classifier are not good you need to understand if the problem is high bias (well, bad results on train set and test set) or if it is a high variance problem (nice results on train set but bad results on test set). If you have high bias problem - more data or more powerful classifier will definitely help. If you have a high variance problem - more powerful classifier is not needed and you need to thing about the generalization - introduce regularization, remove couple of layers from your ANN maybe. Also possible way of fighting high variance is geting much, MUCH more data.
So to sum up, you need to use iterative approach and try to increase the amount of data step by step, until you get good results. There is no magic stick classifier and there is no simple answer on how much data you should use.