Tuesday, July 3, 2012

Simulated Annealing: How to boost performance through Matrix Cost rescaling

One of the most widely algorithm used in Machine Learning is the Simulated Annealing (SA).
The reason of its celebrity lays in:
  • Simplicity of implementation
  • broad spectrum of applicability
There are many implementations of it available for almost all languages, and many books and papers on the topic; personally the best book I would like to suggest is the Simulated Annealing and Boltzmann Machines: A Stochastic Approach to Combinatorial Optimization and Neural Computing wrote by Korst & Aarts.

The problem
In this post I'm going to describe a very canonic combinatoric problem (NP hard complexity) often approached through annealing methods: the complete Asymmetric Travelling Salesman Problem (ATSP).
The ATSP problem, as you know, can be taken as base to model several problems as for instance network routing problems, chemistry and physics problems. The ATSP is so charming (at least for me) because the nature of the problem is very easy and the way to find out the best solution is extremely trivial: you have to explore all the possible solutions!
The only problem is the number of solutions you have to compare:
  • with 10 towns you have to explore ~ 3.6*10^6 solution.
  • with 50 towns you have to explore ~ 3 *10^64 solution.
In the past (I'm not aware if it is still open) there were many competitions to find the best solution of the asymmetric and complete TSP with more than 30000 towns.

...Considering the time required to obtain the optimal solution, in many cases a sub optimal solution obtained exploring less than 1% (as usually happens with SA) of the solution space could be enough!

The experiments
I considered two instances of the problem, the first one with 10 towns and the second one with 50 towns.
Even if for this kind of problem there are more efficient algorithms to face TSP, I chose the simulated annealing just to show how versatile it is and to show how much the rescaling operation is essential in order to maximize the performance of your algorithm. 

Why rescaling? (I'll try to be technical as little as possible)
The SA bases its powerful on the ability to overcome the local minimum through the acceptance at time "t" of a bad solution respect a better solution found at the time "t-1". It assigns an acceptance probability based on the sigmoidal function that decrease if the cost of the new solution is higher respect the current solution.
So if the cost distance between the current solution and the new solution is very high, the acceptance probability will be very small.
This probability is mitigated by the temperature schedule that allow more acceptance rate (of bad solutions) when the temperature of the system is high, and it is reduced once the temperature of the system decreases during the exploration of the solution space.
The aim of the rescaling is to increase the distance between towns having high cost and to reduce the cost of short distance between towns.
To obtain it I apply a "quasi" n log n function to the cost matrix.
(A publication on the topic should be ready by the end of the year).

All experiments as usual as been done through "home made" application written in Mathematica.
  • The cost matrix has been built  using a uniform distribution with integer values (1 to 10 for the first experiment and 1 to 100 for the second one).
  • Just to have a reference point I assigned to the cost matrix the minimum value (1) in correspondence of the external perimeter of the "town system"
  • The ATSP with 10 towns has optimal solution cost = 9. The ATSP with 50 towns has optimal solution cost = 49.
After 2000 cycles, SA found a solution having cost = 13. (the graph on the II quadrant represents the acceptance prob.).
The initial temperature beta has been set = 5.

After 1000 cycles, SA with MatrixCost rescaled found the optimal solution having cost 9.
The initial temperature beta has been set = 5.
I repeated the same experiment increasing the initial temperature to 15.
SA launched with beta= 15. After 5000 cycles the sub optimal solution found has cost = 14

SA with matrix cost rescaled after 2000 cycles found better solution (same initialization of the above experiment)
A larger test has been done to understand the statistical behavior:
The blue line represents the solution costs of the traditional SA.
The red line represents the solution costs of the SA with matrix cost rescaled 

And to conclude, here you are the results obtained with higher complexity (50 towns).

ATSP (50 towns) comparison between traditional SA and SA with rescaled matrix cost.

Let's try to increase the number of cycles from 1000 to 3000.

The % of success of the new strategy seems increase with the increase of the number of cycles.

Let's see what happens increasing once again the number of cycles:

The assumption seems to be confirmed: the new strategy performs better than the traditional SA even if the number of cycles is higher. 
  1. Often the accuracy/performance of an algorithms depends on the way you feed it with the data! Rescaling your data is quite always essential!
  2. The strategy requires further tests and validation.
...We will discuss about the function used to rescale the matrix costs and some other application of SA on different problems in the next posts.
cheers & stay tuned
Special thx to Tom V. : he is on the top of the list of the contributors for the sponsorship program.


  1. This is important for businesses as the tools for data mining will predict the future trends and behaviors thus allowing the businesses to make proactive, knowledge driven decisions..

    Best knowledges of Knowledge Discovery through Data Mining,Data Mining Stages and so many more also..

  2. Interesting post! The technique you describe is may be in some sort similar to "search space smoothing" published here http://dx.doi.org/10.1016/S0378-4371(97)00207-0

  3. thx John for your interest to the post.
    I'm going to have a look to the paper you mentioned. It could be something of similar!
    ...we could do some comparative test to understand better under which condition the approach works better or worse.
    Did you already try to implement it?

  4. That quasi n log n function sounds interesting.
    To deal with the problem that the temperate needs to be rescaled to the problem size and time executed, I use a different approach as you can see here. Basically, I have a timeSpendGradient that goes from 0.0 to 1.0 during the solver run. I combine that with a temperature that represents the maximum delta score between 2 solutions to create automatic rescaling.

  5. Hi Geoffrey,
    thx for your contribute!
    If I'm not wrong your approach works on dynamic modification of the acceptance prob.
    There are many papers on that methods: did you do some comparison tests between your approach and the standard SA (or between other methods)?

  6. If you need any assistance regarding Surface crystallines, then Quantum Heat Treaters India Pvt Ltd Hydrogen Annealing process for Magnetic Shielding Solutions, Inert Gas Annealing, Stress Relieving Process in Coimbatore, India. To know more visit Quantum Heat Treaters