Comments on Text & Data Mining by practical means: Uncertainty coefficients for Features Reduction - comparison with LDA technique

Please can you answer me : PCA or LDA reduce numbe...

2013-12-18T10:27:23.690-08:00

Please can you answer me : PCA or LDA reduce number of coefficents or number of frame for feature vector in speech recgnition?

Hi Lev, exactly, for this experiment I ranked the ...

2012-05-10T09:19:30.052-07:00

Hi Lev,
exactly, for this experiment I ranked the features using this formulation.
The vectors have been built using the first 3000 features scored through TF-DF and sorted in decrescent order (by score).
The feature selection has been done over these vectors.
I didn't play a lot with C or kernel params just because I wanted highlight that feature selection plays a primary role respect param optimization. (of course, if you train with wrong param the classifier doesn't work well :) ).
The graph you mentioned has been built as following:
I sorted the features by TFDF score, then for each feature I plotted the respective entropy.
In this case there is a clear correlation between features having high TFDF score and High entropy: BUT THIS IS A TOY PROBLEM!! in the real scenario this correlation is much more smoothed.
About the accuracy:
LDA as clustering algorithm is by its nature unsupervised, but you can use it to extract features from the corpus.
In the paper I used for the comparison test, the authors built the vectors using the features extracted through LDA, then they trained an SVM and they measured the accuracy as True positive (being a boolean case it's not so wrong).
Of course I did the same to compare using the same metric.
Actually it should be better measure the sensitivity in this case because the two classes are not balanced in term of size.
Please let me know If I answered all your questions!
I hope to see you again, maybe as a follower clicking "join this site" on the right panel.
cheers
cristian

Hi Christian, I went through your note and I hav...

2012-05-09T23:40:22.821-07:00

Hi Christian,

I went through your note and I have several questions.

The weights you used are I(L,X) / H(X) of feature X ?

You write "from the above training set I extracted (after stemming and filtering process) all the words and I used them to build the boolean vectors."
Does that mean that you haven't counted the frequency of the words in the docs but used only its inclusion or non-inclusion?

Why haven't you search for the best C of SVM?

What do you mean exactly by "The above graph shows the entropy of the first 3000 features sorted by TF-DF score." ? TF-IDF makes sense only if we think documents+feature domain but not features domain only so I cannot imagine what do you exactly use for sorting. I must be misunderstanding something here.

This is not clear for me what you measured on accuracy in case of LDA since it is an unsupervised method. Can you explain it a bit.

Thanks a lot for the info! Actually I'm not su...

2012-05-09T13:32:09.588-07:00

Thanks a lot for the info!
Actually I'm not surprise that there are app for the features selection entropy based because it is a method very general purpose and easy to implement.
As you know I encourage always people in implementing by your own the algorithm, because it's the best way to understand advantages and limits of it and to adapt it to your specific problem (in this case there are many variants).
cheers
c.

Here is a web-based app that uses mutual informati...

2012-05-09T13:09:11.551-07:00

Here is a web-based app that uses mutual information for feature selection http://www.simafore.com/blog/?Tag=keyconnect