Wednesday, November 23, 2011

Neural Nets Tips and Tricks: add recall Output neuron

After my former post about the Neural Networks frameworks, my email box has been literally flooded!
Let me clarify something: I didn't develop my "home-made" app to compete with the majors company in this field! My experiment has been done just to proof that there are so many different algorithms to train neural nets, so many models designed to work in specific domains, that in my opinion the best approach to work with this elegant machine learning algo is to implement it in house. that's it!
Anyway, in these email someone suggests really state of art models, dynamic momentum, cost function entropy based and so on: I'm really happy of the collection of paper I got for free :)

As mentioned, in many cases the customization of algorithms is the only way to achieve the target, but sometimes, some tricks can help to improve the learning even without changes of learning strategy!
Consider for example our XOR problem solved through neural networks.
Let's see how we can reduce considerably the epochs required to train the net.
As you know the Back Propagation is based on the famous delta rule, and for the "hidden to output" layer the delta is equal to:
Delta rule for the "hidden to output" layer
How can we reinforce  to speed up the learning process without modify the delta rule?
One of the possible answers is: duplicating the output neuron!!
So instead of change the strategy, we slightly modify the topology of net to obtain the following:
The modified Network: notice how changes the delta for the neuron h1 

The neuron O2 has the same identical target T of the neuron O1: basically the heuristic consists in duplicate the target to reinforce the delta contribute.
In the above image you can see the new contribute provided by new delta for the neuron O2; when the network finds a good way in the gradient descending, the output of O1 will be similar to O2 and the delta for h1 will receive a double correction because delta of O1 will be pretty much the same of the delta of O2.
...I've done my best to explain "by practical means" and from a theoretical prospective someone could stick his nose (I apologize for that... but the soul of the blog is to privilege the intuitive explanation).
As usual let me show the effects:
I did 10 tests to compare the original configuration and the network modified with the "recall neuron" using exactly the same number of hidden neurons, and the same param. configuration.
In 7 cases the new net reduced the learning phase by 210%. 
In 2 cases the new net took the same time of the original net.
In 1 case the new net didn't find a solution (due to oscillation problems).
Here you are some representative examples:
Original Net: convergence after 800 cycles.

Net with recall neuron: convergence after 120 cycles (notice how fast the error slumps) 
Here you are another example:
Another trial with original configuration (convergence after 600 cycles)

Another example of the effectiveness of the method: convergence obtained around at cycle 200) 
Just to conclude, let me show how quickly slumps the error from the synapses prospective. Below the most representative error surface for the last trial shown above.
Error surface plot for the most important connections (referred to the above net): notice again how the error slumps very quickly.
Stay tuned


  1. Could you explain the charts & graphs a bit better? What is the Y axis in the convergence plots? What is being plotted on the surface plots?


  2. You are right John, I forgot to specify the unit measures.
    the graphs related to the convergence the x-axis = number of training cycles and y-axis = Square error for the pattern (that is (expected output- Net Output)^2).
    For the convergence graph related to the duplicated output the y-axis is the worst square error committed by the net between the first and the second output (thath is
    Max[(expected output- Net Output_1)^2,(expected output- Net Output_2)^2]).

    About the 3D graph:
    x = weight for synapse k
    y = weight for synapse k
    z = Square error committed by the net.
    Is it more clear now?

    For more details I strongly suggest the Hertz book.

  3. what a idiot man what u try to determine

  4. Hi mirza, sorry I didn't get your message. Could you please explain better your point of view?