Text & Data Mining by practical means: Power Real Polynomial to approximate functions: The Gradient Method

Thursday, December 8, 2011

Power Real Polynomial to approximate functions: The Gradient Method

In the real world rarely a problem can be solved using just a single algorithm, more often a solution is a chain of algorithms where the output of the former is the input for the follower.

But you know that quite often machine learning algorithms return functions almost always extremely complex, and they don’t fit directly in the next step of your strategy.

In these conditions, it is really helpful the trick of the function approximation, that is, we reduce the complexity of our original model using a new easier model.

The big advantage of the function approximation is that you can impose the form of new model to describe your data.

One of the most common “model” used is the polynomial approximation (also know with the term linearization).

As you can imagine the polynomial approximation takes vantage of the linear combination of monomial and it makes easier analytical approaches.

First approach: Polynomial fitting (power-real) gradient descent

To apply this approach your simplified model must be differentiable!

I don’t want spend time explaining how to implement it because it is so standard and common that it is available everywhere on internet.

In my toolbox there is always a notebook with the gradient descent implementation: it is so general purpose that I couldn’t leave without it!!

Let me show a quick demonstration of it on two different cases:

In the first one I found an approximation of Sin[x] (…In a real scenario I would be really happy to have a “complex” model following a Sin function).

Polynomial approximation for Sin(x): A good approximation obtained after 30.000 iterations.
(In red the original function, in orange the approximation)

In the following case I choose a much complex function:

And the results for this example is:

Polynomial approximation: here the convergence required six monomials and 70.000 iterations.
The final results has obtained via "fine-tuning" and it took 5000 iterations.
(In red the original function, in orange the approximation)

In both example the convergence has been measured like the square error respect the target function.

Considerations:

If you increase the number of the monomials, you are increasing the risk to fall in a local minimum.
You should start with very small values of eta learning (I usually use two different eta learning: the first one for the coefficients and the second one for the exponents).
To speed up the convergence you should increase incrementally the value of eta learning.
To refine the learning it could be useful "Fine tuning" presenting samples with asymmetric distribution to focus the learning on the tricky regions.

Of course this strategy requires a bit of experience to be used and good dose of patient!

Be aware that the approximation as just local effects: if you move far away the training region, the error increase dramatically.

In the next session I show how to increase the accuracy of this technique mixing the gradient descent with simulated annealing.

Stay tuned.