Very large data sets to train a neural network using simulated annealing

2.2k views Asked by At

Since simulated annealing takes too much time even for 10-15 sets of two inputs for my multi-layered feed-forward network, how can I use a 100k data set to train for 8-9 inputs ?

Some guesses:

  • Sampling from random locations. (for example: only 10 reads from a 200-set spiral problem, using random data each time)
  • Using data quantizers to separate a 200 data-set into 20 quantized areas to feed 20x neural networks

But these cannot give same output, first option cannot guarantee to find which spiral owns the input locations(bad test case), second option needs 20x more neurons and compute power.

Taking 100k square sum of output error each iteration makes it infinitely longer to converge than a 10 square sum version. Because finding a more stable state is just too low probability. Maybe there is a way to iterate from first data to the end without computing alltogether (getting a better error state from just one data is very easy but how to iterate through? When second data is reached, first would be forgotten because simulated annealing is a randomness action)

Example for four data sets :{ {0,0} {0,1} {1,0} {1,1} } ----> {0,1,1,0} ---->easy

100k set of data : coordinates of two big spirals , NN tries to find a distinctiveness through data. Hard.

What is the coreect way?

Converging for first data then second then ... last data, at last decreasing temperature?

Converging fully for first data, decreasing temperature, when done, doing same for other datas?

Doing by batches bigger than 10-15 takes forever.

Can we take two data's converged weights and get a mean value of those weights and use?

For example, for creature-creator of a spore-like game, when a creature has 40 legs, teaching the walking could be hard because there will be many random situations and the learning will be needed real-time(at same time with the game running)

Most important: is simulated annealing acceptable for online-learning if yes, how? Any pseudocode known?

A diffuse-map for example, trained in just a second(or two) for more than 190 data sets using gpu and mapped(calc) in nanoseconds-microseconds:

Before training: enter image description here

After training: enter image description here

(optional)Intensification to get a hard-separated boundaries(red and blue are separated with 0.5f boundary in this example) enter image description here

But this type of learning is for only two inputs (two dimensions) and for every output there must be another map.

Any free java library that can do the thing like in these pictures will be most appreciated.

1

There are 1 answers

3
mnutsch On BEST ANSWER

Some specific additional information about what you are working on and/or code samples would be helpful.

However, here is what I suggest:

It sounds like you have a data set with 100k lines in it. Some of those lines of data are probably duplicates. Typically with an artificial neural network the program increases the strength of a connection between two nodes of the network when they are activated.

Rather than train your artificial neural network with one line of input at a time, perhaps a faster strategy would be:

Identify the unique lines in the input and count how often they occur. When you train the artificial neural network, use the count of that input as factor for how much you increase the strength of the connection for the nodes.

Instead of going through 100k iterations of training the network, you would be able to go through a lower number of training iterations. The result should be that the process overall will take less time and processor power.

(If you don't want to programatically identify and count the unique items in your data set then you can use Microsoft Excel's Pivot Table function to do this within a couple of minutes.)

Hope this helps!

Edit: I added the following text as an edit, because it was too long to add as a comment.

Thanks for adding additional details about the problem that you are trying to solve. This is a very complex question and there isn't a simple answer.

In an artificial neural network there are Nodes, which are points which can be activated. Then there are connections between nodes. These connections can either work towards Activating another node, or Suppressing another node. Finally, the Strength of these connections can grow or weaken based off of feedback. In other words, each connection has a factor which described its Strength. Whether or not a node is activated by other nodes is a function of the sum of the strength of connections to it, where the connection is to an already activated node and where the connection strength is either positive or negative as determined by whether the connection is Activating or Suppressing.

The idea behind artificial neural networks is that rather than defining exactly how they work, you would set up the basic rules and then train it to grow the tool organically. The reality is that there is some design which has to go into the way that artificial neurons are connected to one another.


I mentioned that identifying a 3D spiral requires a very complicated artificial neural network. When creating the basic design, it is easier to define what a basic working end product would like like and then build off of that.

Specifically, let's define how an artificial neural network will look at cross section of a spiral line to determine if that is a line. A one dimension cross section of a line might be made of up three nodes (imagine three pixels side by side). Imagine this as three variables in an array, where the value can either be 0 (not activated), or 1 (activated). Let's call these nodes 1a, 2a, and 3a.

We want Our miniminal example of an artificial neural network to look at these three pixels and determine if there is a line in the middle. A line in the middle might be defined as either: the middle node activated with the two outer nodes not activated (a white line on a black background), or the two outer nodes activated and the middle node not activated (a black line on a white background).

We then need a second layer of nodes to determine if the cross section identified is a white line or a black line. This layer needs two nodes. Let's call them nodes 1b (white line), and 2b (black line).

Finally, we want a third and final layer with just one node. This node should get activated if a line is identified. Let's call this node 3.

Now let's define the connections between the nodes.

Node 2a should have an Activating connection to node 1b. Node 2a should have an Suppressing connection to node 2b. Nodes 1a and 3a should have an Suppressing connection to node 1b. Nodes 1a and 3a should have an Activating connection to node 2b. Nodes 1b and 2b should both have an Activating connection to node 3.

If you recall, a cross section of a line is detected if node 3 gets activated. Activation of node 3 means that this specific point in the data looks like the cross section of a line.

Here is how some data might get processed by this function neural network example:

**When a line cross section is present

0,1,0 (node 1a is not activated; node 2a is activated; node 3a is not activated)

Only node 2a is activated in this data set example. This will trigger an Activating connected to node 1b and a Suppressing connection to node 1a.

In the second layer of the neural network, node 1b gets activated. This will trigger an Activating connected to node 3.

Activation of node 3 indicates that there is a cross section of a line at the point in question.

**When a line cross section is not present

0,1,1 (node 1a is not activated; node 2a is activated; node 3a is activated)

Nodes 2a and 3a are activated in this data set example. Node 2a will trigger an Activating connected to node 1b and a Suppressing connection to node 1a. Node 3a will trigger an Suppressing connected to node 1b and a Activating connection to node 1a.

Neither node 1b or 2b will become activated, because the Activating and Suppressing connections will balance each other out (assuming that the Strengths of the connections are equal).

Node 3 will not become activated, indicating that there is not a line segment at the location in question.


That was an example of what one very small part of a functioning neural network in this problem would look like.

In nature there are millions of neurons dedicated to a problem like this in many many layers. There would be Activating and Suppressing connections which are initially arranged randomly between neurons in adjacent layers.

To train the neural network, you would activate layer 1 based off of data input. If the end node (layer 3 in the simplified example) gets activated AND the input is a spiral, then you would increase the Strength of all of the Activating connections between activated neurons. If the end node (layer 3 in the simplified example) gets activated AND the input is a spiral, then you would ALSO increase the Strength of all of the Suppressing connections between neurons where the ealier layer is activated and the neuron connected to is suppressed.

Gradually after enough training, the idea is that the values of the connections in your neural network will naturally evolve to identify a spiral.


To answer your specific follow up question: "So there is no way to use all datasets to train in real-time?"

Yes, you can train datasets in real time. I understood your original question to be what is a faster way to train a really large data set.

If your program uses a weighted average of all training sessions to define the neurons' connection Strengths, then the knowledge of the prior training inputs is already incorporated into the network.