Saturday, June 7, 2014

Fractals: a deterministic and recursive algorithm to reproduce it (Part II)

Fern fractal estimation thru recursive and deterministic algorithm.

A fractal can be described as fixed point of a Markov process: given the proper contractive maps it's easy to generate it.
The question is: given the contractive maps, is it possible to generate the fractal using a pure deterministic approach?

We already observed that the iteration of the contractive maps starting from a fixed point that lyes on the border of the fractal returns points of the convex hull of the fractal.

  • What's happen if we recursively apply such schema to the points obtained from the the above procedure?
  • Is it possible recreate the markov process (or at least approximate it) removing any probabilistic aspects?
The below algorithm, given the contractive maps of the fractal, returns at each iteration better approximation of it.

At each iteration it considers the fractal as better approximation of the contour of the original image.

The Algorithm (by practical means)
  1. Consider a fixed point $P_0$ of a contractive map $\tau_1$ that lyes on the convex hull of the fractal.
  2. Choose randomly a point $P_i$ of the fractal and apply the above contractive map until the distance to $P_0$ is negligible.
  3. Map the point calculated at step 2 using sequentially all the contractive maps.
  4. Map each point obtained from point 3 the former step with $\tau_1$ till the distance to $P_0$ is negligible.
    1. K--;
    2. For each point obtained from point 4 go to point 3.
To explain how it works I plotted the list of points got using $K=1$ iterations of the algorithm:

Bigger is $K$, more accurate will be the result.
The above procedure works only if the contractive map $\tau_1$ has a fixed point that lyes on the convex hull of the fractal!!


I iterated the algorithm with $K=4$ times. At each iteration the algorithm returns a deeper description of the contour of the fractal (...even though definition of contour for a fractal doesn't make any sense, it gives at least a practical meaning):
Results of the Recursive Algorithm iterated with K=1,2,3,4. 

If we try to overlap the results obtained with the original fern we get:
Original Fern, and the overlap with the results obtained using the recursive algorithm (K=4).
Conclusion and next steps
I showed how to depict the IFS generated thru the markovian process as a recursive and completely deterministic process.
We noticed also, that in the fractal there are special points (as for instance $P_0$) that play a crucial role to describe the IFS.

The question now is the following:
is it possible leverage such special points and the above recursive algorithm to estimate the parameters of the contractive maps?
... I will show you a procedure that partially answer the question and some other example of the algorithm above described!
Stay Tuned.

Saturday, May 24, 2014

Fractal parameters estimation thru contour analysis - Part I

I really love fractals theory. Nothing fascinates me as the amazing structures generated by a chaotic processes that describe (hyperbolic) fractals.
It's fairly easy to draw a fractal image:
  • The input
    • A small set of very easy function (linear transformations)
    • A probability to choose one of the aforementioned functions.
  • The process
    • Choose randomly a starting point.
    • Choose randomly one of the above functions according to the above defined probability.
    • Map the starting point using the above selected function.
    • iterate as long as you want :)
  • Admire the meaning of the life!
    • just plot the points
Wherever you are, whatever decision you make, despite the chaos, the overall picture will be always the same. More you will move, more clear the picture will be!

When I approached for the first time fractals,  after the reading of the great Barnley's book (Fractals Everywhere) I opened the laptop and I wrote the routine to draw my first IFS!
...By mistake I also plotted the lines that join the points:


...It's quite a messy, isn't it?
Sometimes it's better to give up to connect the points, just relax and wait until everything will be clear. If we focus on every single detail we risk to lose what really counts!


What is a fractal (Iterated Fractal System)?
There are many formal definitions to describe an IFS, in my opinion the most effective describe a fractal as the fixed point of the Markov process (mentioned above).
The process converges to the fractal :)

The powerful of the fractal
In literature there are plenty articles in which fractals are used to solve complex problems (financial analysis, stock exchange preditictions, description of biological processes,...).
Also in Computer Science fractals fond a good collocation, think about the image compression: the ivy leaf image takes around 25MB of space, all the information to obtain it thru the IFS takes less 1KB!!
The problem 
Given the ivy leaf image, what are the parameters of the Markov process to generate it?
A bit more formally what we have to estimate is:
  • A set of functions: $\{ \tau_1, \cdots,\tau_N \}$, where $\tau_i(p_0)=A_1 \cdot p_0 +Q_1$, where:
$ A= \begin{bmatrix}\alpha & \beta\\ \gamma & \delta \end{bmatrix} Q= \begin{bmatrix} \epsilon\\ \zeta \end{bmatrix} $
  • We need also the estimate the probability to choose each $\tau_j$
The combination of the collage theorem and the box counting approach is the most common technique to solve the problem.
When I was student I approached the problem from a different angle. I have to say that the results obtained were partials but I still think that something more can be done :)
Before to start we need one more notion: the contractive maps (have a look at Banach Theorem)
Under certains conditions the iteration  $\tau_j$ led to a fixed point:
Example of contractive map applied to an ellipsoid.
It converges to a fixed pointed.

First conjecture:

  • An IFS is characterised by a fixed point that lies on its convex hull.
  • From a fixed point that lies on the border of the IFS, the iterations of the contractive maps that generate the fractal return the convex hull of the fractal.

The ivy leaf IFS is generated by 4 contractive maps. Each color describes the map used to generate the point.
The above animated gif shows the meaning of the conjecture.
An experimental evidence about the fact that at least one fixed point lies on the convex hull of the fractal can be obtained changing the params of the maps:
The light blue points depict the fixed point of the maps used to plot the fractals.
Despite the changes in the maps, the fixed point on top of the leaf still lays on the convex hull of the fractal.
In the next post I'll show you a nice recursive algorithm I found to obtain different levels of convex hull for an IFS.
Stay tuned

Friday, May 2, 2014

Waiting Time Polynomials: tech explanation - last part

This is the last step to complete the explanation of the waiting time polynomials formula.

Unfortunately it's a bit technical, but I'm sure that it can be understood without deep math knowledge.
At the very end if you can't explain it simply you don't understand it well enough! (A. Einstein)

The trick
Last time we left with the tally of overall waiting time $ w(x_i) = \phi(x_i)-|x_i| $ where $\phi(x_i)$ returns just the position of the last $|x_i|$ in the vector $V$.
Let's have a look at the following example that will be used during the remaining steps.
There are two questions that might be answered:

  • given $|x_1|= i, |x_2|= j, |x_3|= k $ what are the tuples $\{w(x_1),w(x_2),w(x_3)\}$?
  • given a $\{w(x_1)=I,w(x_2)=J,w(x_3)\}=K$, how many vectors $V$ can be built?
To answer to the first question, I noticed that the three coloured cells are the only ones that really count.
The idea is the following:

  1. consider the three cells as placeholders
  2. analyse the admitted values for each of them
  3. replace the placeholders with all the possible permutations of the alphabet $\{x_1,x_2,x_3\}$.

Let's start with the case depicted in the above image, where we assumed that $\phi(x_1) < \phi(x_3) < \phi(x_2) $, then we have the following facts:
  • $ \phi(x_1)$ can take values between: $0 \leq \phi(x_1) \leq |x_1|+|x_2|+ |x_3|-2$
  • $ \phi(x_2)$ can take just one value: $|V|=|x_1|+|x_2|+ |x_3|$
  • The upper bound of $ \phi(x_3)$ is $|V|-1$ because it can slide till the second last element of $V$, that is $\phi(x_2)-1$
  • what about the lower bound of $\phi (x_3)$? We have two cases depicted in the below image:

To sum up, so far we explained the following part of the formula (I hope you don't mind that I changed a bit the indexes notation):

We have now to consider that for each configuration of $\{w(x_1)=I,w(x_2)=J,w(x_3)\}=K$ we can have more than one vector $V$.
Do you like combinatorics? The remaining part it's just matter of tally, and the only formula we need is the formula for permutation with repetitions. I let you rub up the concept on your favourite website for trivial combinatorics.

The formula can be split in two chunks, because we have to blocks of cells to be filled

  1. In how many ways can we fill the cells between the positions $[1,\phi(x_1)]$?
  2. In how many ways can we fill the cells between the positions $[phi(x_1),phi(x_2)]$? 

Let's answer the first question we have to find the values for the denominator of the following:
  • we have $|x_1|-1$ cells that can be filled.
  • it contains all the instances of $x_1$ (except for the last the occupied $\phi(x_1)$)
  • the number of $x_3$ instances depends by $\phi(x_1)$ and $\phi(x_3)$:

  • the computation of the number of instances of $x_2$ in the first slot is straightforward, and it can easily derived by difference:
    • $\frac{(\phi(x_1)-1)!}{(|x_1|-1)!(|x_3|-Min(|x_3|,\phi(x_3)))!\#(x_3)!}$;
    • $(|x_1|-1)+(|x_3|-Min(|x_3|,\phi(x_3)))+\#(x_2)=\phi(x_1)-1$
    • so $ \#(x_2)= \phi(x_1)-|x_1|-(|x_3|-Min(|x_3|,\phi(x_3))) $
This explains the following boxed part of the formula:

The final step is to count in how many ways we can fill the slot 2 depicted by the interval $[phi(x_1),phi(x_2)]$ and to make the formula more readable let's rename $(|x_3|-Min(|x_3|,\phi(x_3))= \epsilon$.
As we did for the first slot we have to identify the values of the denominator of the below formula:

  • Out of $|x_3|$ instances, $\epsilon$ have been placed in the slot 1, so the slot 2 contains exactly $|x_3|-1- \epsilon$.
  • again by difference we can get the instances of $x_2$:
    • the occurrences of $x_2$ before $\phi(x_3)$ are exactly $\phi(x_3)- (|x_1|+|x_3|)$
    • the occurrences of $x_2$ in the slot 1 (listed above) are: $ \#(x_2)= \phi(x_1)-|x_1|-\epsilon $
    • that is : $ \#(x_2)=\phi(x_3)- (|x_1|+|x_3|)- \phi(x_1)+|x_1|+ \epsilon$
    • finally we have: $ \#(x_2)=\phi(x_3)-\phi(x_1)-|x_3|+ \epsilon$

That's complete the proof of the formula.
It's quite easy now extend the formula to more variables. The only expedient to make it easier is to remove from the formula the operator $Min$ splitting the formulas in two branches. 
I'll show it in paper.
Note about Complexity
What's the most painful point of this formula?
... The introduction of placeholders requires to apply the formula for each permutation of the variables involved. It means that having $k$ variables we need to calculate the formula $k!$
Anyway I don't expect to use the formula with large set of variables, after all the principle of selecting the right and small set of features is always recommended!

As usual Stay Tuned.

    Saturday, April 19, 2014

    Waiting Time Polynomials: how to derive the analytical formula: Part IV

    Introduction before you start
    I got many clarification requests about the Waiting Time Polynomials I published on the blog in the last three posts.
    The paper is almost ready to be submitted for review, but I think that some technical explanation might be interesting also for not academic audience.
    I consider myself a curious and hungry seasoned student, and I know how can be tedious read formulas and mathematical passages especially when it comes from a blog!!
    So why technical explanations?
    The answer is in the following quote of one of my favourite scientists, Gregory Chaitin. In "The quest for Omega" he wrote:

    The books I loved were books where the author’s personality shows through, books with lots of words, explanations and ideas, not just formulas and equations! I still think that the best way to learn a new idea is to see its history, to see why someone was forced to go through the painful and wonderful process of giving birth to a new idea! To the person who discovered it, a new idea seems inevitable, unavoidable. The first paper may be clumsy, the first proof may not be polished, but that is raw creation for you, just as messy as making love, just as messy as giving birth! But you will be able to see where the new idea comes from. If a proof is “elegant”, if it’s the result of two-hundred years of finicky polishing, it will be as inscrutable as a direct divine revelation, and it’s impossible to guess how anyone could have discovered or invented it. It will give you no insight, no, probably none at all. 

    That's the spirit that leads the following explanation!

    Definition of the problem
    Given an alphabet of 3 elements $\{X_1,X_2,X_3\}$, the function $w(X_i) $ counts  the number of failed trials before the last event $ X_i $.
    Consider now the following configuration: \[ \{\left\vert{X_1}\right\vert =i , \left\vert{X_2}\right\vert =j,\left\vert{X_3}\right\vert =k\}: i+j+k= Z \wedge i,j,k>0 \]

    • What are the admitted sequences  $\{w(X_1),w(X_2),w(X_3)\}$ ?

    Step I: Find all the possible configurations of events
    How can we list the sequences of length $Z$ that can be built with $ \{\left\vert{X_1}\right\vert =i , \left\vert{X_2}\right\vert =j,\left\vert{X_3}\right\vert =k\}: i+j+k= Z \wedge i,j,k>0$ ?

    Example of overall waiting time $w(x_i)$  in a succession of events.
    • once we set the values of the first two variables, the third it's determined by $Z-i-j$.
    • we imposed that all the variables occur at least once, so we $X_1$ can assume all the values between $[1,Z-2]$.
    • for each value of $X_1$ the variable $X_2$ can assume values between $[1,Z-i]$.
    •  $p_i$ is the probability that $X_i$ occur in a Bernullian trial.
    Now we have all the ingredients to make the cake:

     $ \sum_{i=1}^{Z}\sum_{j=1}^{Z-i}\sum_{k=1}^{Z-i-j}{p_1^ip_2^jp_3^k}$  

    In the first two summations,  $i$ assumes values between $[1,Z]$ just to keep the formula cleaned.
    ...I let you proof why the result doesn't change :).
    last point about this step:the limit of the above summation $ Z \rightarrow \infty = \frac{p_1 p_2 p_3}{\left(p_1-1\right) \left(p_2-1\right) \left(p_3-1\right)}$ 
    Such limit will be used to build the probabilistic density function.
    Curiosity (helpful for complexity analysis...):
    • The number of sequences that can be built with vectors of length $[3,Z]$ are $\binom{Z}{3}$
    • The number of sequences that can be built with vectors of length $Z$ are $\binom{Z}{2}$
    Step II: Waiting for an event!
    What's the easiest way to describe the overall waiting time for an event in a finite succession?
    There are many ways to get the $w(x_i)$, the easiest I found is given by the position of the last occurrence of $x_i$ minus the number of occurrences of $x_i$.
    For instance, let's consider $w(x_1)$:
    • The position of the last occurrence of $x_1= 8$;
    •  $\left \vert{X_1} \right \vert = 4 $ 
    • $w(X_1)=4$
    Where we are:
    The first two steps explain the circled pieces of the formula:

    What the "overall waiting time" for?
    For each event $X_i$ we are counting the holes among all the occurrences, so smaller is the overall waiting time, closer each other are the events $X_i$: it's a measure of proximity for the occurrences of $X_i$.
    What I did, is to extend such measure (it would be interesting to prove that it's really a measure!) to different kind of events (aleatory variables) ${X_1, X_2,...,X_n}$ over the discrete line of the time.
    There are several area for which such kind of analysis might be helpful, I showed last time an its application as powerful document classifier, where each variable $X_i$ is a word of a document.
    If we consider a document as a succession of $Z$ words, the proximity measure inducted by the waiting time polynomials is a sort of finger print for the document, since for similar documents we expect that the same words are characterised by similar overall waiting time.
    Moreover, the dependency among the words are considered, since we are taking in account simultaneously an arbitrary number of words (the alphabet ${X_1, X_2,...,X_n}$).

    In the next step I'll explain the logic to get the remaining pieces of the puzzle, that will make easier the generalisation of the approach to an arbitrary alphabet.
    Stay Tuned!

      Tuesday, April 8, 2014

      Coefficients of Waiting Time Polynomials listed in OEIS Database!

      I'm happy to announce that the sequence generated by coefficients of waiting time polynomial has been listed in the OEIS database (The Online Encyclopedia of Integer Sequences).
      The sequence is: A239700.
      In the next posts I'm going to explain how I derived the analytical formula of the polynomials.
      As usual: Stay Tuned!

      Friday, March 28, 2014

      Coefficients of Waiting Time Polynomials: a nice representation

      I was doing some simulation for the paper I'm writing about the waiting time polynomials, and I got something unexpected.
      If we sort in lexicographic order all the polynomials generated for a given length of the events vector, and we consider the coefficients, what  you get is an apparently irregular series of integers.
      To capture some regularity, I decided to plot such matrix:

      The matrix plot, obtained considering an alphabet of three words and vectors having length=35.

      Isn't nice?
      Sometimes, an appropriate graphical representation helps in capture interesting aspects of your data

      Stay Tuned

      Tuesday, March 18, 2014

      Waiting Time Polynomials Document Classifier - Part III

      In the post is presented an innovative definition of polynomials associated to waiting time processes (analyzed in the former posts). Such polynomials are here  successfully used as document classifier. Comparative tests with SVM show significant accuracy improvements.
      Boolean Classification tests based on test set of 8k randomly generated documents composed using an alphabet of three words.
      To encourage who found quite intricate the formula I presented a couple of posts ago, I'm going to present you an its practical application that might be a good incentive to analyze the formal aspects with more attention :)
      What I show to you today is one of several application of such approach: a document classifier having higher accuracy than traditional methods as SVM (trained with gaussian kernel) and Neural Networks back propagated.

      Characteristics of the classifier
      • It's a supervised learning algorithm
      • It's completely non parametric
      • It can be used natively to classify multiple classes datasets.

      The Algorithm
      Let's assume to have a training set composed by two classes of documents: Cl_1, Cl_2.

          Learning Phase: Estimation of Geometrically distributed Random Variables.
      1. Define an alphabet of three words {w1,w2,w3} using frequencies criteria or more sophisticated techniques.
      2. For each class of training set:
        • estimate parameters {p1, p2, p3} of the respective {X1(w1),X2(w2),X3(w3)} geometrically distributed random variables.
      3. Calculate the polynomials associated to {X1(w1),X2(w2),X3(w3)} using:

          Testing Phase: document classification
      1.  for each document Di of the test set:
        • Identify the number of occurrences of  {w1,w2,w3}: {O_w1,O_w2,O_w3}
        • Select the polynomial for which:
        •  {O_w1,O_w2,O_w3} =p1^O_w1 p2^O_w2 p3^O_w3.
        • Calculate the value of the polynomials P_Cl_1, P_Cl_2 using:
          1. {p1, p2, p3} estimated for Cl_1
          2. {p1, p2, p3} estimated for Cl_2
      2. Classify the document:
        1. If (P_Cl_1>P_Cl_2) Di belongs to Cl_1
        2. Else Di belongs to Cl_2
      1. How to select the polynomial

      Let's consider the polynomials calculated using the aforementioned formula and assume that in document Di the word w1 is repeated 2 times, the word w2 is repeated 2 times and w3 is repeated 1 time.
      Then, In the step 1 of the testing phase we have to choose the polynomial boxed in the below list:
      Polynomials generated for O_w1+ O_w2+O_w3=5
      2. Same polynomial, different values of  p1, p2, p3.

      - How many polynomials are generated for O_w1+ O_w2+O_w3 = 25?
      The answer is straightforward: 276 that is, all the possible configurations of 3 addends for which the sum =25. In the Encyclopedia of Integer Sequences, there are exciting explanation on such series.

      - How does the polynomial change for different values of p1, p2, p3?
      It's quite impressive how two polynomials can be different despite the setting of {p1, p2, p3} is almost identical.
      Look at this example for which:
      • On the left chart I plotted a polynomial with p1= 0.33, p2= 0.33, p3=0.34.
      • On the right chart I plotted a polynomial with p1= 0.4, p2= 0.3, p3=0.3.
      • In the first case the polynomial takes the maximum value for O_w1=8, O_w2= 8, O_w3 =9 ...not a big surprise!
      • In the second case the polynomial takes the maximum value for O_w1=15, O_w2= 9, O_w3 =1. In this case the result is more complicated to be explained!
      Document classification comparative test

      To test the accuracy of my method I performed a comparative test using a randomly generated document set.
      • The training set was composed by 100 documents (used to train a Gaussian Kernel SVM and to estimate params p_i for my method).
      • The test set was composed by:
        • Cl_1: 4000 documents randomly generated using a configuration of {p1, p2, p3}.
        • Cl_2: 4000 documents randomly generated using a different configuration of {p1, p2, p3}.
      • The accuracy has been tested using different configurations of {p1, p2, p3}, and considering different size of documents.
      First experiment: 
      just the first 25 words have been considered  to estimate the params {p1, p2, p3}, to train the SVM and to test the the accuracy.
      Accuracy results considering just the first 25 words.
      Second experiment: 
      Same test as above using the first 35 words of the documents.

      Accuracy results considering just the first 35 words.
      All the results showed there are referred to accuracy achieved on average using 10 different randomly generated test set and trying to configure SVM params to maximize the accuracy.
      As you can see the method based on definition of "Waiting Time Polynomials" that I'm proposing, performs significantly better than SVM.
      More comparative tests will be shown in the publication I'm writing on such topic.

      Further notes
      Processes based on waiting time or geometrically distributed random variables are extremely important for risk assessment and risk evaluation.
      I'll show you in another post some application of such polynomials in this field.
      As usual, stay tuned