Page 202 - Data Science Algorithms in a Week
P. 202

186               Luis Rabelo, Edgar Gutierrez, Sayli Bhide et al.

                       this project (in the order of petabytes) The data set was split in two separate sets: One for
                       training and the other one for validation. The objective was to predict when to do over-
                       haul of the respective RCC panel.
                          Table 1 shows the decile table for the 8,700 examples of the validation dataset (with
                       24  input  parameters).  There  are  870  examples  for  each  decile  as  shown  in  the  first
                       column. The second column shows the predicted responses of the different deciles which
                       were able to be predicted by the model. The third column is just the predicted response
                       rate in %. The fourth column is the cumulative response rate starting from the top decile
                       to the bottom one. For example, for the top decile is 856 divided by 870. On the other
                       hand, the cumulative response rate for the second decile is 856 plus 793 (1,649) divided
                       by the addition of 870 and 870 for the second decile (1,740). The Fifth column shows a
                       comparison between the different deciles with respective to the bottom one. For example,
                       the value of 1.32 for the top decile tells us that the model predicts 1.32 better than an
                       answer provided by no model (just randomly). The value of 1.32 is obtained by dividing
                       the predicted response rate of the top decile (98%) divided by the predicted response rate
                       of the bottom decile (74%). Therefore, that is the predictability.

                                       Table 1: Decile table with the respective columns.



























                          Figure 12 shows the bar-graph for the predicted responses. It is flat in general (i.e.,
                                                                                         rd
                                                   th
                       the predicted response of the 4  decile is greater than the one from the 3  decile). The
                       bars seem to be the same height for the first 5 deciles. Therefore, the model has moderate
                       performance (74% in the validation set).
   197   198   199   200   201   202   203   204   205   206   207