Introduction before you start
I got many clarification requests about the Waiting Time Polynomials I published on the blog in the last three posts.
The paper is almost ready to be submitted for review, but I think that some technical explanation might be interesting also for not academic audience.
I consider myself a curious and hungry seasoned student, and I know how can be tedious read formulas and mathematical passages especially when it comes from a blog!!
So why technical explanations?
The answer is in the following quote of one of my favourite scientists, Gregory Chaitin. In "The quest for Omega" he wrote:
The books I loved were books where the author’s personality shows through, books with lots of words, explanations and ideas, not just formulas and equations! I still think that the best way to learn a new idea is to see its history, to see why someone was forced to go through the painful and wonderful process of giving birth to a new idea! To the person who discovered it, a new idea seems inevitable, unavoidable. The first paper may be clumsy, the first proof may not be polished, but that is raw creation for you, just as messy as making love, just as messy as giving birth! But you will be able to see where the new idea comes from. If a proof is “elegant”, if it’s the result of two-hundred years of finicky polishing, it will be as inscrutable as a direct divine revelation, and it’s impossible to guess how anyone could have discovered or invented it. It will give you no insight, no, probably none at all.
That's the spirit that leads the following explanation!
Definition of the problem
I got many clarification requests about the Waiting Time Polynomials I published on the blog in the last three posts.
The paper is almost ready to be submitted for review, but I think that some technical explanation might be interesting also for not academic audience.
I consider myself a curious and hungry seasoned student, and I know how can be tedious read formulas and mathematical passages especially when it comes from a blog!!
So why technical explanations?
The answer is in the following quote of one of my favourite scientists, Gregory Chaitin. In "The quest for Omega" he wrote:
The books I loved were books where the author’s personality shows through, books with lots of words, explanations and ideas, not just formulas and equations! I still think that the best way to learn a new idea is to see its history, to see why someone was forced to go through the painful and wonderful process of giving birth to a new idea! To the person who discovered it, a new idea seems inevitable, unavoidable. The first paper may be clumsy, the first proof may not be polished, but that is raw creation for you, just as messy as making love, just as messy as giving birth! But you will be able to see where the new idea comes from. If a proof is “elegant”, if it’s the result of two-hundred years of finicky polishing, it will be as inscrutable as a direct divine revelation, and it’s impossible to guess how anyone could have discovered or invented it. It will give you no insight, no, probably none at all.
That's the spirit that leads the following explanation!
Given an alphabet of 3 elements $\{X_1,X_2,X_3\}$, the function $w(X_i) $ counts the number of failed trials before the last event $ X_i $.
Consider now the following configuration: \[ \{\left\vert{X_1}\right\vert =i , \left\vert{X_2}\right\vert =j,\left\vert{X_3}\right\vert =k\}: i+j+k= Z \wedge i,j,k>0 \]
Step I: Find all the possible configurations of events
How can we list the sequences of length $Z$ that can be built with $ \{\left\vert{X_1}\right\vert =i , \left\vert{X_2}\right\vert =j,\left\vert{X_3}\right\vert =k\}: i+j+k= Z \wedge i,j,k>0$ ?
Consider now the following configuration: \[ \{\left\vert{X_1}\right\vert =i , \left\vert{X_2}\right\vert =j,\left\vert{X_3}\right\vert =k\}: i+j+k= Z \wedge i,j,k>0 \]
- What are the admitted sequences $\{w(X_1),w(X_2),w(X_3)\}$ ?
Step I: Find all the possible configurations of events
How can we list the sequences of length $Z$ that can be built with $ \{\left\vert{X_1}\right\vert =i , \left\vert{X_2}\right\vert =j,\left\vert{X_3}\right\vert =k\}: i+j+k= Z \wedge i,j,k>0$ ?
![]() |
| Example of overall waiting time $w(x_i)$ in a succession of events. |
- once we set the values of the first two variables, the third it's determined by $Z-i-j$.
- we imposed that all the variables occur at least once, so we $X_1$ can assume all the values between $[1,Z-2]$.
- for each value of $X_1$ the variable $X_2$ can assume values between $[1,Z-i]$.
- $p_i$ is the probability that $X_i$ occur in a Bernullian trial.
$ \sum_{i=1}^{Z}\sum_{j=1}^{Z-i}\sum_{k=1}^{Z-i-j}{p_1^ip_2^jp_3^k}$
In the first two summations, $i$ assumes values between $[1,Z]$ just to keep the formula cleaned.
...I let you proof why the result doesn't change :).
last point about this step:the limit of the above summation $ Z \rightarrow \infty = \frac{p_1 p_2 p_3}{\left(p_1-1\right) \left(p_2-1\right) \left(p_3-1\right)}$
Such limit will be used to build the probabilistic density function.
Curiosity (helpful for complexity analysis...):
last point about this step:the limit of the above summation $ Z \rightarrow \infty = \frac{p_1 p_2 p_3}{\left(p_1-1\right) \left(p_2-1\right) \left(p_3-1\right)}$
Such limit will be used to build the probabilistic density function.
Curiosity (helpful for complexity analysis...):
- The number of sequences that can be built with vectors of length $[3,Z]$ are $\binom{Z}{3}$
- The number of sequences that can be built with vectors of length $Z$ are $\binom{Z}{2}$
What's the easiest way to describe the overall waiting time for an event in a finite succession?
There are many ways to get the $w(x_i)$, the easiest I found is given by the position of the last occurrence of $x_i$ minus the number of occurrences of $x_i$.
For instance, let's consider $w(x_1)$:
There are many ways to get the $w(x_i)$, the easiest I found is given by the position of the last occurrence of $x_i$ minus the number of occurrences of $x_i$.
For instance, let's consider $w(x_1)$:
- The position of the last occurrence of $x_1= 8$;
- $\left \vert{X_1} \right \vert = 4 $
- $w(X_1)=4$
The first two steps explain the circled pieces of the formula:
What the "overall waiting time" for?
For each event $X_i$ we are counting the holes among all the occurrences, so smaller is the overall waiting time, closer each other are the events $X_i$: it's a measure of proximity for the occurrences of $X_i$.
What I did, is to extend such measure (it would be interesting to prove that it's really a measure!) to different kind of events (aleatory variables) ${X_1, X_2,...,X_n}$ over the discrete line of the time.
Applications
There are several area for which such kind of analysis might be helpful, I showed last time an its application as powerful document classifier, where each variable $X_i$ is a word of a document.
If we consider a document as a succession of $Z$ words, the proximity measure inducted by the waiting time polynomials is a sort of finger print for the document, since for similar documents we expect that the same words are characterised by similar overall waiting time.
Moreover, the dependency among the words are considered, since we are taking in account simultaneously an arbitrary number of words (the alphabet ${X_1, X_2,...,X_n}$).
Moreover, the dependency among the words are considered, since we are taking in account simultaneously an arbitrary number of words (the alphabet ${X_1, X_2,...,X_n}$).
In the next step I'll explain the logic to get the remaining pieces of the puzzle, that will make easier the generalisation of the approach to an arbitrary alphabet.
Stay Tuned!
cristian


I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in Data Mining, kindly contact us http://www.maxmunus.com/contact
ReplyDeleteMaxMunus Offer World Class Virtual Instructor led training on Data Mining. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
For Free Demo Contact us:
Name : Arunkumar U
Email : arun@maxmunus.com
Skype id: training_maxmunus
Contact No.-+91-9738507310
Company Website –http://www.maxmunus.com
Really useful information. we are providing best data science online training from industry experts.
ReplyDeleteThank you for sharing your article. Great efforts put it to find the list of articles which is very useful to know, Definitely will share the same to other forums.
ReplyDeleteData Science Training in chennai at Credo Systemz | data science course fees in chennai | data science course in chennai velachery | data science course in chennai with placement
Just stumbled across your blog and was instantly amazed with all the useful information that is on it. Great post, just what i was looking for and i am looking forward to reading your other posts soon!
ReplyDeleteData science training in Bangalore
Data science online training
Great collection and thanks for sharing this info with us. Waiting for more like this.
ReplyDeleteData Science Course in Chennai
Data Science Training in Chennai
Data Science Training in Anna Nagar
R Training in Chennai
R Programming Training in Chennai
Machine Learning Course in Chennai
Machine Learning Training in Chennai
Data Science Course in Chennai
Thanks for sharing, check out
ReplyDeleteData Mining Service Providers
Data Mining Process
Amazing content.
ReplyDeleteData Mining Process
My rather long internet look up has at the end of the day been compensated with pleasant insight to talk about with my family and friends.Surya Informatics
ReplyDeleteThe great information that you shared. It will help all of them. Thanks for posting. Keep maintain the updates
ReplyDeletePHP Development Company
|
Ecommerce Solution Provider
|
Data Extraction Solutions
|
data science course bangalore is the best data science course
ReplyDeleteGreat Article
ReplyDeleteData Mining Projects
Python Training in Chennai
Project Centers in Chennai
Python Training in Chennai
nice blog,Thank you for sharing your article. Great efforts put it to find the list of articles which is very useful to know,we are providing best data services like Data mining
ReplyDeleteData Appending
data cleansing
email marketing
to enhance your business We also offer the first 100 leads for free. In case you don’t like our work, we offer a no questions asked 120% money-back guarantee too! Contact us now and let us do the trick for you.
BIZPROSPEX
I am looking for and I love to post a comment that "The content of your post is awesome" Great work!
ReplyDeletedata science course
This comment has been removed by the author.
ReplyDelete