Wednesday, June 29, 2011

Linear classifier and non linear Classifier or rather: Input space and Feature Space

Sometimes during some meeting to explain a general classification problem I notice that many people don't understand the essential concept behind the classification.
In all fairness I discovered that many people confuse very often clustering with classification, but this is another story, and I promise to treat that in other post :).
What is the core concept to understand deeply the classification problem? ...the complexity of the problem itself!
Let me be more clear, and observe the following scenario:
Imagine that you are interested to divide the blue points from the red points, and more specifically you are interested to a black box taking in input a point, it is able to assign automatically the class (red or blue) to the point.
Well, Imagine that the only tool admitted to divide the plane to separate the points is the linear function.
Mhmm.... as you can see whatever line you try to draw is not able to split the points red from the blue points.
-> A linear function able to divide the plane splitting correctly the blue points from the red points is called linear classifier.
-> A non linear function able to that is called non linear classifier. (in this case the circle equation solve the problem.... but the real world is often more complex...)
Question: is it possible map the points of the original problem in another space more complex (that is having higher dimension) where is possible to divide the points using only a linear classifier?
The answer is YES. The trick is honor of famous mathematicians, but it is very very useful to make up a linear classifier (read easy to develop and easy to reuse in other context!!) and you can find the theory everywhere in internet.
Let me show the effect with the above problem:
I took every points in 2D and i mapped in 3D space using the function (called Kernel function): 
[x,y]->[x^2, Sqrt[2]*x*y, y^2]
And I obtained the following:
The red points now are grouped together and with a linear plane (that is a linear function in three dimensional space) we are able to divide again our points!
Here you are the overall problem:
If you are interested to further details I can share the Mathematica notebook I coded to generate the above images.
You don't need the License, you can play also with the player available here:
Let me cleanup the 10 rows of code and I'll publish the source file.
See you soon!


  1. Here you are the link of the Mathematica Notebook:

  2. Hi. Yes, there are tecknics that alow one to reduce the complexity of the model. BUT! How to do this not on educational datasets? For example this could be hardly done on dataset from comunication company... But if we assume that in some situations this could be done - how to find such situations?

  3. Hi Dmitrly,
    Thanks for your comment.
    Not directly! the kernel mapping is general trick useful in combination with other techniques: for example the SVM (when you use them with kernels) uses the kernel mapping in combination with the wolf dual.
    Moreover find the right kernel is not so easy!

  4. If it is so hard to find right kernel, then your post speaks about theory, not practice)

  5. it was a practical way to understand better how a kernel mapping works!
    I explained that, because I used the kernel mapping in the next svm post.
    ...The theory behind the kernel is more complex and detailed: I tried to be as possible practical showing how it works.
    Practical doesn't mean use something without any knowledge.
    Please Consider this post like an easy and partial introduction base to the mapping widely used in the SVM.