Page 199 - Data Science Algorithms in a Week
P. 199

Predictive Analytics using Genetic Programming            183

                       predictive modeling system to find symptoms of damage, deterioration, or excessive wear
                       in future flights.




















                       Figure 10: RCC is a lightweight heat-shielding material (NASA, 2008).
                          In the years of 2008, 2009, 2010, and 2011 NASA assembled a Tiger Team to study
                       potential issues with the shuttle’s Reinforced Carbon-Carbon (RCC) leading-edge panel
                       (Dale, 2008). The Tiger Team’s investigation generated huge amounts of structured and
                       unstructured data of the RCC panels. This big data was able to be used with different
                       methodologies to build analysis and predictor models. One of the methodologies studied
                       was GP.


                                            USING GENETIC PROGRAMMING

                          We will be explaining in more detail step 6 of the framework outlined in the Section
                       Complexity  and  Predictive  Analytics.  We  are  assuming  that  steps  1  –  5  have  been
                       completed successfully (an effort that can take several months for this case study).


                       Knowledge Discovery and Predictive Modeling

                          Input engineering is about the investigation of the most important predictors. There
                       are different phases such as attribute selection to select the most relevant attributes. This
                       involves  the  removing  of  the  redundant  and/or  irrelevant  attributes.  This  will  lead  to
                       simpler models that are easier to interpret and we can add some structural knowledge.
                       There are different filters to be used with the respective objectives such as:

                            Information Gain
                            Gain ratio
   194   195   196   197   198   199   200   201   202   203   204