A project to build an innovative Information Management tool to extract, correlate and expose unstructured information.
A screen shot of one of the mindmap generated by the tool. Different colors have been used to depict sub-concepts. |
Time ago I published in this blog some posts where I presented some algorithms I designed to extract relevant information from documents and more in general unstructured content (as tweets, blogs post, web pages).
I don't want spend too much words, I guess a demo better describes the idea I have in mind.
It's still a prototype, and a lot of work is still required, but in this video I hope you appreciate the idea.
In the video I tested the prototype of the application using a wikipedia page.
PS
To optimize the video, watch it on youtube with the option "quality HD".
...Looking forward to receive feedback!
Stay Tuned
cristian
Some questions about it:
ReplyDelete- what is your definition of phrase?
- how does your algorithm selects the most relevant ones? Rank wrt frequency?
- Does concept equal word? How do you define a sub-concept?
- What kind of information retrieval algorithm have you in mind? (Input, output, logic)
- What is the meaning of the arrows' direction? Syntactic ( x->y ::= word x precedes word y)?
The graphs seems reasonable, but I was wondering what would be the MindGraph if we apply the technique on the entire English Wikipedia. What do you expect?
I see some clear overlaps with at least two pieces of research I am currently working on. If you like, we can have a chat about it. :)
Thanks,
michele.
Ciao Michele,
ReplyDelete->The definition of the phrase is determined by the algo: it chooses autonomously how chunk the text in phrases (no punctuation rules are used).
--> The ranking is not based on frequency approach. It works with graph theory methodologies (...it's part of the research I'm working on).
--> the sub concepts are defined by graph clustering technique (for the time being I'm using something standard).
->The arrow: x->y ::= word x precedes word y.
It t doesn't make sense having the graph for the entire wikipedia!: it's much more helpful to aggregate homogeneous info and find relationships.
The idea is to have a set of mindgraph (I like your definition!) for each document.
Sure we can set up a virtual coffee whenever you want!
cheers
c.
Amazing content.
ReplyDeleteData Mining Process
ReplyDeleteThanks for sharing this.,
csm training
Scrum master Training
I always like to read well-written articles, like this one I found in your post. Everyone will thank you for sharing this knowledge because it is really useful. fantastic work best inventory management software
ReplyDelete