Page 330 - Data Science Algorithms in a Week
P. 330

Predictive Analytics for Thermal Coal Prices Using Neural Networks …   311

                       Unfortunately, in the problem of the thermal coal price, there are not enough observations
                       to  calculate  CA.  Therefore  it  was  decided  not  to  use  the  traditional  method.  It  was
                       decided instead of using crossvalidation (CV). As indicated by Moody and Utans (1992),
                       CV  is  a  re-use  sample  method  that  can  be  used  to  estimate  CA.  CV  makes  minimal
                       assumptions  about  the  statistics  of  the  data.  Each  instance  of  the  training  database  is
                       selected apart and the neural network is trained with the remaining (N – 1). The results of
                       all n, one for each instance of the dataset, are averaged, and the mean represents the final
                       estimate of CA. This is expressed by the following equation (Moody and Utans, 1992):

                                                      2
                              (  ) = ∑     (   −       (  ) (   ))                                 (1)
                                             ̂
                                      =1
                                                     
                                           

                          Figure 6 represent the process using CV to select an appropriate number of neurons
                       in the hidden layer. We selected the potential number of hidden neurons using a range
                       from 4 to 30 neurons. Figure 6 indicates CV for each number of hidden neuron utilized.
                       The lowest CV was for an architecture with λ = 10. Therefore, we will have 10 neurons
                       in the hidden layer of the neural network.
























                       Figure 6. CV and the selection of neurons in the hidden layer. λ = 10 was the lowest CV.


                       Elimination of  Input Variables

                          The next step was to select the input variables which contribute to the prediction of
                       the thermal coal price. We begin removing input variables which are not required. To test
                       which factors are most significant for determining the neural network output using the
                       neural  network  with  10  hidden  neurons,  we  performed  a  sensitivity  analysis  and  the
                       respective results are depicted in Figure 7. We defined the “Sensitivity” of the network
                       model to input variable β as (Moody and Utans, 1994):
   325   326   327   328   329   330   331   332   333   334   335