I've used this XOR example on Arduino to train a 2-input 1-output dataset. The Dataset I have contains around 30,000 values. I used 4 values instead of the XOR table and I got good results when tested.
I wanted to train it with more data. I couldn't do it on Arduino due to its RAM constraints. I've re-written the code in C and trained it in my laptop instead. I've given 20,000 values as the training dataset, it took around 2 days to train and I got very bad outputs. I've changed the hidden neurons to 4 and still the result is very bad. I mean I got the final error around 12. The way they calculated error in the given XOR example is by summing all the errors, so when I give 20,000 inputs, the sum is big enough.
Is there a better way I can do it? Should I increase the number of layers or hidden neurons? What is the best way I can fit a dataset with 30,000 values?
EDIT:
I've shared the code on Github: Repo
This repo contains the dataset as well as the code.
The fact that you posted a question like this means that you did not read enough about neural networks (or that you don't have enough experience in this field). And this is no criticism, it is perfectly common, since this is a very complex field.
The solution to your question highly depends on your problem and dataset.
Speaking of "layers" usually refers to a specific artificial neural network (ANN) architecture called multilayer perceptron (MLP), so this is the architecture I'll try to explain.
Usually increasing the number of hidden layers does not give you better performances, just slower training. Sometimes using two hidden layers of perceptrons (so three layers, one output and two hidden, since the input layer is not made of perceptrons) can help solving particularly complex classification problems, but I've never seen a good 3-layer ANN.
Usually, then, when your network has a poor behavior you have to change 1) the dataset (since most of the times it's a poor dataset design that lead to poor behavior) or 2) the network topology (i.e. use another ANN architecture other than then MLP).
The understanding of the problem is essential and should be "passed" to the ANN through the inputs. For instance, if you are making a fingerprint detector, you know that the image can be rotated, so if you apply a transformation that makes the image invariant to the rotation (for instance you convert the image coordinates to polar representation) you will (usually) get better performances.
Remember, the most important step is the choice of the dataset. You have to avoid having too few data, but also too many data is not a good choice. This is a problem known as overfitting. The network will train to recognize only the data you passed to it, and will not be able to find the "similar" ones you pass. Moreover the dataset has to be balanced: if you want to train a network to recognize all the dogs and then pass it only beagles images, it will fail to recognize dobermans.
These were all generic advices. And note the "usually" adverb I used.
Now, for your specific problem, there are two main problems.
The first one is related to the
Error
variable. You say that this has a high value, around 12. Well, what's 12? It's just a number. Usually you should compute the Mean squared error to "estimate" the performances. What you are computing, on the other side, is the sum of all the squared errors. You should divide the value by the number of test cases (Error /= PatternCount;
) and throw away that 0.5 when adding the squared errors. Then you can increase a bit the Success constant to let it stop before (maybe 0.001 is fine, but you have to tune that).And... Your way of operating is not the correct one. Usually you should divide your dataset in two parts: the training data (usually around 80% of the points) and test data (usually around 20%) chosen randomly. You train the ann with the training data, then when you have it you have to pass the test data through it and detect the performances (so you have to pass it data which has never entered the training process). This way you will test the ability of the ANN to generalize, not to remember the points you have passed.
In the end, if you really want to make a neural network work, you'll have to experiment a lot with the data you have. It's best if you have a high power PC instead of a small arduino due, and re-use the work of other people (get some libraries). The best approach I found when working at university is using tools made specifically for numerical computation (I used matlab, since we had the license, but you can use octave - open source - with the neural network extension). This way you can easily modify the topology, the dataset composition and the learning parameters. When you have something working, you can extract the parameters and embed them into what you want (making an ad hoc implementation in C/C++/Java/python/whatever).
Best regards
PS The fun thing is that I started this as a comment, then quickly run out of space...