I had a quick look at the dataset (4 components, I plotted the four combinations of 3D subsets, and I saw that the shapes were pretty much like an ellipsoid).
Let me report the call:
me:"what kind of clustering algorithm did you use?"
him: "k-means!"
me: "the stupid k-means?"
him: "yes, the data set is pretty much regular and the points are well divided so I believe kmeans should work properly...Do you think I made a mistake to implement it?"
me: "no you are a fantastic developer. More easily it cannot work with data organized like ellipsoid shape! ...try to visit wikipedia, there is a nice sample with iris data set to explain where it fails"
him: "but iris data set is a tricky set, the points are overlapped, my scenario is different: it is more easy!".
me: "ok, this evening visit my blog, I'll show you a nice sample where the data are definitely divided, but the kmeans fails!"
Two set of points, in different colors the clusters obtained, the two big points are the centroids. |
|
I really hope that my dear friend will be convinced that doesn't exist an algorithm useful for all kind of problem!
Moreover... I would have in my job dataset "tricky" like Iris!!!
Contact me for the notebook to play with the simulations.
Moreover... I would have in my job dataset "tricky" like Iris!!!
Contact me for the notebook to play with the simulations.
Stay Tuned.
PS: I'll be on vacation until the end of the month... see you soon!