Page 193 - Data Science Algorithms in a Week
P. 193

Predictive Analytics using Genetic Programming            177

                          GP is basically a variation of the genetic algorithm and it follows the standards of
                       EAs as outlined above. The population of individuals in GP are computer programs. The
                       representation  of  individuals/computer  programs  are  trees.  These  hierarchical  or
                       structured  trees  are  the  population  and  they  can  have  different  sizes.  Given  the  tree
                       representation, genetic operators such as tree crossover must be modified accordingly.
                          The following computer programs are parental programs as  displayed in Figure 2.
                       The first one is: 0.25Y + X + 1.75 and using a LISP S-expression (+ (* 0.25  Y) (+ X
                       1.75)). The second program is: XY(X / 0.455Z) and using a LISP S-expression  (* (* Y
                       X) (/  X  (*  0.455  Z)). These parental programs have “point-labeled” trees with ordered
                       branches.

















                       Figure 2: Two parental computer programs which are rooted with ordered branches.

                          The standard crossover operation of an EA in the case of GP creates offspring by the
                       exchange  of  subtrees.  Subtrees  can  be  considered  subroutines,  subprocedures  or
                       subfunctions. These subtrees selected by GP are essentially for crossover (see Figure 3).
                       For example, it can select specific labeled sections of the parental computer programs of
                       Figure 2 and decide on a subtree from each parent to be used for the creation of offspring
                       (Figure 3). The first one is selected from point labeled 2 (randomly selected) of the first
                       parental computer program: 0.25Y and using a LISP S-expression (* 0.25  Y). The second
                       subtree  is  selected  from  point  labeled  5  (randomly  selected)  of  the  second  parental
                       computer program: X / 0.455Z and using a LISP S-expression (/  X  (*  0.455  Z)).














                       Figure 3: Subtrees of two parental computer programs selected for crossover.
   188   189   190   191   192   193   194   195   196   197   198