...In this post I would explain why this trick is the base to build one of the most powerful classifier: the support vectors machines (SVM).
Consider the following set of point red and blue:
As you can see, you cannot find a straight line to divide blue points from the red points.
...But as mentioned before using the kernel trick we can find a space having greater dimension where these points can be separate by straight line.
we are looking for a straight line
xi ·w+b≥+1 for yi =+1 and xi ·w+b≤−1 for yi =−1
where xi represents the points (blue or red), yi =+1 to represent red points and yi =−1for blue points.
w and b represent the unknown parameters of the straight lines.
The above equations can be combined into one set of inequalities: yi(xi ·w+b)−1≥0 ∀i
The SVM works out the classification bringing the points in a "kernel space" where the points can be divided finding a simple hyperplane.
I won't enter in technical details (the theory is based on Lagrange multipliers, kernel space, and Karush-Kuhn-Tucker Conditions) but if you are interested you can find an easy but exhaustive description on "A Tutorial on Support Vector Machines for Pattern Recognition".
I implemented in Mathematica an easy routine to call the library LIBSVM to classify the above points (contact me to obtain the notebook):
To obtain this results (overall accuracy of classification: 99,2%) i trained the SVM using a gaussian kernel.
Another 2D sample (just to highlight the hyperplane I removed the color gradations):
|Hyperplanes founds (in blue) for the above points|
One of the interesting aspects of SVM is its vector notation: it allows a complete generalization of the problem: you can use the same algorithm to solve problem in any dimension!
For example in 3D scenario, you have:
Now we are ready to jump into the real world and attempt to classify text documents!
In the next post I will explain a real case application of document classification using SVM.
As usual: STAY TUNED!!