Page 330 - Data Science Algorithms in a Week
P. 330
Predictive Analytics for Thermal Coal Prices Using Neural Networks … 311
Unfortunately, in the problem of the thermal coal price, there are not enough observations
to calculate CA. Therefore it was decided not to use the traditional method. It was
decided instead of using crossvalidation (CV). As indicated by Moody and Utans (1992),
CV is a re-use sample method that can be used to estimate CA. CV makes minimal
assumptions about the statistics of the data. Each instance of the training database is
selected apart and the neural network is trained with the remaining (N – 1). The results of
all n, one for each instance of the dataset, are averaged, and the mean represents the final
estimate of CA. This is expressed by the following equation (Moody and Utans, 1992):
2
( ) = ∑ ( − ( ) ( )) (1)
̂
=1
Figure 6 represent the process using CV to select an appropriate number of neurons
in the hidden layer. We selected the potential number of hidden neurons using a range
from 4 to 30 neurons. Figure 6 indicates CV for each number of hidden neuron utilized.
The lowest CV was for an architecture with λ = 10. Therefore, we will have 10 neurons
in the hidden layer of the neural network.
Figure 6. CV and the selection of neurons in the hidden layer. λ = 10 was the lowest CV.
Elimination of Input Variables
The next step was to select the input variables which contribute to the prediction of
the thermal coal price. We begin removing input variables which are not required. To test
which factors are most significant for determining the neural network output using the
neural network with 10 hidden neurons, we performed a sensitivity analysis and the
respective results are depicted in Figure 7. We defined the “Sensitivity” of the network
model to input variable β as (Moody and Utans, 1994):