Page 202 - Data Science Algorithms in a Week
P. 202
186 Luis Rabelo, Edgar Gutierrez, Sayli Bhide et al.
this project (in the order of petabytes) The data set was split in two separate sets: One for
training and the other one for validation. The objective was to predict when to do over-
haul of the respective RCC panel.
Table 1 shows the decile table for the 8,700 examples of the validation dataset (with
24 input parameters). There are 870 examples for each decile as shown in the first
column. The second column shows the predicted responses of the different deciles which
were able to be predicted by the model. The third column is just the predicted response
rate in %. The fourth column is the cumulative response rate starting from the top decile
to the bottom one. For example, for the top decile is 856 divided by 870. On the other
hand, the cumulative response rate for the second decile is 856 plus 793 (1,649) divided
by the addition of 870 and 870 for the second decile (1,740). The Fifth column shows a
comparison between the different deciles with respective to the bottom one. For example,
the value of 1.32 for the top decile tells us that the model predicts 1.32 better than an
answer provided by no model (just randomly). The value of 1.32 is obtained by dividing
the predicted response rate of the top decile (98%) divided by the predicted response rate
of the bottom decile (74%). Therefore, that is the predictability.
Table 1: Decile table with the respective columns.
Figure 12 shows the bar-graph for the predicted responses. It is flat in general (i.e.,
rd
th
the predicted response of the 4 decile is greater than the one from the 3 decile). The
bars seem to be the same height for the first 5 deciles. Therefore, the model has moderate
performance (74% in the validation set).