Text & Data Mining by practical means: Waiting Time Polynomials: tech explanation

Friday, May 2, 2014

Waiting Time Polynomials: tech explanation - last part

This is the last step to complete the explanation of the waiting time polynomials formula.

Unfortunately it's a bit technical, but I'm sure that it can be understood without deep math knowledge.
At the very end if you can't explain it simply you don't understand it well enough! (A. Einstein)

The trick
Last time we left with the tally of overall waiting time $ w(x_i) = \phi(x_i)-|x_i| $ where $\phi(x_i)$ returns just the position of the last $|x_i|$ in the vector $V$.
Let's have a look at the following example that will be used during the remaining steps.

There are two questions that might be answered:

given $|x_1|= i, |x_2|= j, |x_3|= k $ what are the tuples $\{w(x_1),w(x_2),w(x_3)\}$?
given a $\{w(x_1)=I,w(x_2)=J,w(x_3)\}=K$, how many vectors $V$ can be built?

To answer to the first question, I noticed that the three coloured cells are the only ones that really count.
The idea is the following:

consider the three cells as placeholders
analyse the admitted values for each of them
replace the placeholders with all the possible permutations of the alphabet $\{x_1,x_2,x_3\}$.

Let's start with the case depicted in the above image, where we assumed that $\phi(x_1) < \phi(x_3) < \phi(x_2) $, then we have the following facts:

$ \phi(x_1)$ can take values between: $0 \leq \phi(x_1) \leq |x_1|+|x_2|+ |x_3|-2$
$ \phi(x_2)$ can take just one value: $|V|=|x_1|+|x_2|+ |x_3|$
The upper bound of $ \phi(x_3)$ is $|V|-1$ because it can slide till the second last element of $V$, that is $\phi(x_2)-1$
what about the lower bound of $\phi (x_3)$? We have two cases depicted in the below image:

To sum up, so far we explained the following part of the formula (I hope you don't mind that I changed a bit the indexes notation):

We have now to consider that for each configuration of $\{w(x_1)=I,w(x_2)=J,w(x_3)\}=K$ we can have more than one vector $V$.
Do you like combinatorics? The remaining part it's just matter of tally, and the only formula we need is the formula for permutation with repetitions. I let you rub up the concept on your favourite website for trivial combinatorics.

The formula can be split in two chunks, because we have to blocks of cells to be filled

In how many ways can we fill the cells between the positions $[1,\phi(x_1)]$?
In how many ways can we fill the cells between the positions $[phi(x_1),phi(x_2)]$?

Let's answer the first question we have to find the values for the denominator of the following:

\[\frac{(\phi(x_1)-1)!}{\#(x_1)!\#(x_2)!\#(x_3)!}\]

we have $|x_1|-1$ cells that can be filled.
it contains all the instances of $x_1$ (except for the last the occupied $\phi(x_1)$)
the number of $x_3$ instances depends by $\phi(x_1)$ and $\phi(x_3)$:

the computation of the number of instances of $x_2$ in the first slot is straightforward, and it can easily derived by difference:

$\frac{(\phi(x_1)-1)!}{(|x_1|-1)!(|x_3|-Min(|x_3|,\phi(x_3)))!\#(x_3)!}$;
$(|x_1|-1)+(|x_3|-Min(|x_3|,\phi(x_3)))+\#(x_2)=\phi(x_1)-1$
so $ \#(x_2)= \phi(x_1)-|x_1|-(|x_3|-Min(|x_3|,\phi(x_3))) $

This explains the following boxed part of the formula:

The final step is to count in how many ways we can fill the slot 2 depicted by the interval $[phi(x_1),phi(x_2)]$ and to make the formula more readable let's rename $(|x_3|-Min(|x_3|,\phi(x_3))= \epsilon$.

As we did for the first slot we have to identify the values of the denominator of the below formula:

\[\frac{(\phi(x_3)-\phi(x_1)-1)!}{\#(x_2)!\#(x_3)!}\]

Out of $|x_3|$ instances, $\epsilon$ have been placed in the slot 1, so the slot 2 contains exactly $|x_3|-1- \epsilon$.
again by difference we can get the instances of $x_2$:

the occurrences of $x_2$ before $\phi(x_3)$ are exactly $\phi(x_3)- (|x_1|+|x_3|)$
the occurrences of $x_2$ in the slot 1 (listed above) are: $ \#(x_2)= \phi(x_1)-|x_1|-\epsilon $
that is : $ \#(x_2)=\phi(x_3)- (|x_1|+|x_3|)- \phi(x_1)+|x_1|+ \epsilon$
finally we have: $ \#(x_2)=\phi(x_3)-\phi(x_1)-|x_3|+ \epsilon$

That's complete the proof of the formula.
It's quite easy now extend the formula to more variables. The only expedient to make it easier is to remove from the formula the operator $Min$ splitting the formulas in two branches.
I'll show it in paper.
Note about Complexity
What's the most painful point of this formula?
... The introduction of placeholders requires to apply the formula for each permutation of the variables involved. It means that having $k$ variables we need to calculate the formula $k!$
Anyway I don't expect to use the formula with large set of variables, after all the principle of selecting the right and small set of features is always recommended!

As usual Stay Tuned.
Cristian.

4 comments:

leoJune 27, 2019 at 1:03 AM
Amazing content.
Data Mining Process
ReplyDelete
Replies
AnonymousFebruary 15, 2022 at 10:38 AM

Thank you for providing this blog really appreciate the efforts you have taken into curating this article if you want you can check out data science course in bangalore they have a lot to offer with regards to data science in terms of training and live projects.
ReplyDelete
Replies
WlitporroagoApril 23, 2022 at 7:20 PM
Wlitporroago Patrick Burrell https://wakelet.com/wake/nw88eUTX4ZztS6jvyL2dv
recalnide
ReplyDelete
Replies
Help With ExamFebruary 20, 2025 at 11:05 PM
If you're looking for expert assistance, you can pay to take my online exam and get professional support. It’s a smart way to manage your workload and secure top grades. Don't hesitate to seek help when needed!
ReplyDelete
Replies

Add comment

Pages

Friday, May 2, 2014

Waiting Time Polynomials: tech explanation - last part

4 comments: