Text & Data Mining by practical means: ADA boost: a way to improve a weak classifier

Thursday, October 13, 2011

ADA boost: a way to improve a weak classifier

We have seen that often a classifier is not able to do its job properly.

Why? There are many reasons:

Even if many algorithms are in principle general purpose, we know that testing the algorithm on different data set (that is on different problem) the accuracy can change a lot.
The training hasn’t been done properly: consider for example datasets having high dimensionality, how can you be sure that your classifier hasn’t affected by overfitting?
The classifier it self cannot solve such problem: e.g a perceptron cannot solve the xor problem (more generally cannot solve non linear problems).

So the idea behind ADA, is to linearly combine a set of weak classifiers to obtain a better classifier!

One of the reasons that makes ADA so powerful is its generality:

It is a meta-algorithm, that is, you can think ADA like a strategy where you can combine different weak classifiers to cooperate together to improve the overall accuracy. So in the linear combination of classifiers you can have perceptrons mixed with neural networks, decision trees and so on.

Every single classifier is specialized to solve just a piece of the problem, so combining them you obtain the overall solution of the problem.

How it works?

ADA is an iterative strategy where on each round, the weights of each incorrectly classified example are increased so that the new classifier focuses more on those examples.

Further details are available everywhere in internet, but I would suggest to focus on the distribution function that it minimizes.

This is a crucial point: the function (at least in the original implementation) is a convex distribution, and the algo tends to fall in local minimum because basically it performs a greedy process.

Again: when you use an algorithm focus on its limits!!

As usual I’m not going to report here the meta-code of algo: there are hundreds of web pages with exactly the same things and I don’t want add entropy to the Universe J

I prefer show the some graphical example done in Mathematica: I believe that an image is thousands times better than a formula to understand the underlying concepts! (…I love the theoretical aspects of the algorithms, but after the theory you must to be practical and able to implement the solution!!).

So let’s start with a 2-dimensional problem:

I ran on the same data set (not linearly separable) many SVM having different setting (of course an SVM well trained can easily solve the problem).

I choose voluntarily a weak kernel (polynomial) with a bad capacity factor values.

Here you are the hyperplanes found by the svm. As you can see, no one solution found fits properly problem.

The set of weak classifier used to build the boosted classifier

Let me show now what is the output of ADA after few iterations.

The output of ADA using the above weak classifiers. Notice how the linear combination produce a segmented classifier.

Below an of example with 3-D datasets

The set of weak hyperplanes found by the weak svms (mixed)

Here you are the ADA output after just 2 iterations:

ADA output after two iterations

and the optimal solution found after 7 iterations:

The ADA output after 7 iterations: as you can see the optimal solution has been found.

Contact me if you are interested to the notebook with the Mathematica implementation.

Stay tuned!

cristian

12 comments:

AnonymousOctober 18, 2011 at 5:17 PM
I'm interested in the notebook.

Would you please show it or send it?

edsferr@uol.com.br
ReplyDelete
Replies
ItmanOctober 26, 2011 at 6:42 AM
Sorry, I am not quite a specialist on the topic. Is "weak classifier" a term, or does it mean that the classifier works poorly?

If you mean the latter, I am here to disagree, because SVM with a polynomial kernel works pretty well for the case of learning to rank.
ReplyDelete
Replies
Cristian MesianoOctober 26, 2011 at 11:24 AM
Hi itman, as I wrote in the post an svm properly trained can solve the problem without efforts ( a polynomial kernel having degree equal 3 is enough). I overfitted the 6 svm just to simulate a weak classifier. Btw thank to your remark
ReplyDelete
Replies
ItmanOctober 26, 2011 at 12:18 PM
I choose voluntarily a weak kernel (polynomial) with a bad capacity factor values

Thank you for the clarification, I see now, but I missed when I glazed over the text. This seems to be a very interesting opportunity.
ReplyDelete
Replies
Cristian MesianoOctober 26, 2011 at 12:53 PM
Let me know if you are interested to the notebook.
Cheers
cristian
ReplyDelete
Replies
AnonymousDecember 8, 2011 at 3:22 PM
Could you post the Sourcecode for this notebook, for the 2-D Classifier as well as for the 3-D Classifier?
Or could you send the notebook to giselagauss@web.de ??
This looks very interesting to me :)
ReplyDelete
Replies
Shanthi SelvarajDecember 24, 2011 at 7:39 PM
Hi
Happy New Year.
I've read your comments. I'm a research scholar in Anna University, India. Right now I'm concntrating on AdaBoost Alg. I've appled Ada with Random Tree and ID3 algorithms. It improved the performance of Random Tree but reduced that of ID3. What might be the reason? Can you give any suggestions related to this?
ReplyDelete
Replies
Shanthi SelvarajDecember 24, 2011 at 7:43 PM
My mail id: psshanthi@yahoo.com
ReplyDelete
Replies
manaDecember 25, 2011 at 11:22 AM
Hi That was a useful read Thank you!Could you please send me the notebook mt74.mana@gmail.com.I am currently pursuing research in text mining at IIITB Bangalore India.
ReplyDelete
Replies
AntonioFebruary 9, 2012 at 6:27 AM
I'm interested in the source code/notebook of this implementation. How can I obtain it?
ReplyDelete
Replies
AnonymousOctober 9, 2012 at 3:07 AM
good
ReplyDelete
Replies
NikishaDecember 19, 2019 at 1:35 AM
I was looking for thedata migration consulting companies through which I can identify the data migration procedure accurately. I must say the solutions, as well as the services offered by your company, have helped in moving the viable data.
ReplyDelete
Replies

Add comment

Pages

Thursday, October 13, 2011

ADA boost: a way to improve a weak classifier

12 comments: