To be honest I'm not an expert on this kind of technology, but I saw that it is very easy impress people with astonishing announcements like:
"forget the occurrences, it is time of concepts; our engine is able to really understand the meaning of the key search!".
Wow! At the glance seems that these engines are able to order autonomously a pizza with out any human interaction.
But. ...there is alway a "but":
1) is it always the best solution a "semantic seeker"?
2) is it so complex build a customer solution starting from our old search engine?
Well. For the first point my opinion is that for each domain exists a specific solution. For example if the set of documents indexed in your docbase is mainly composed by invoices, or table documents (like spreadsheets) or advice for new orders, I don't see many advantages using semantic engine.
May be for documents containing many phrases and free text, this kind of technology could bring some added value. ...But I would like to see some honest benchmark on that.
Even for the classification problem is not yet proved that a semantic classifier works better a unstructured classifier (I'll arrange a post on that with some real test!).
For the second point.... the only way to check it is.... to try!
So let start with a stupid key search like "turbojet airplane"
Our old full text search engine (FTSE) will retrieve docs containing both "turbojet" and "airplane".
we could enrich with our specific (business) requirements!
For Example, using many different dictionaries available in internet I arranged a recursive (2 loops) lookup on the first set of "synonyms" (extend in this trivial example with broader terms and morphological derivatives terms). After that I built the graph of the connections and to restrict the set of pertinent terms i clustered the graph using a stupid k-means. The final list of words is selected choosing the cluster containing the original word! ...No more chatting:
for the word "turbojet" the set of words to enrich our old FTSE is:
and for the word "airplane":
Even in this case i painted the selected cluster of words in orange.
The method is very general and tunable because:
1) you can select or deselect you preferred dictionaries (depending on your data domain),
2) you can enrich/impoverish the set of words tuning the number of loops for the recursive search.
3) you can choose different clusters
Of course the latest FTSE are faster and the sophisticated, but we can extend the life of our old engine with few maintenance.
I love simplicity :) ...occam's razor.