Page 333 - Data Science Algorithms in a Week

P. 333

314 Mayra Bornacelli, Edgar Gutierrez and John Pastrana

Figure 9 shows the performance of the neural network (NN) developed with the most
important variables according to sensitivity analysis. The neural network uses 8 variables
as input, 10 hidden neurons in a hidden layer, and the output represents the price in US $
of the thermal coal future quarter.

Using Regression Trees to Predict the Price of Thermal Coal

It was decided to use a second artificial intelligence paradigm (Regression Trees) to
verify the results obtained with neural networks. This provided a good opportunity to
compare both methodologies. In regression trees, the objective is to model the
dependence of a response variable with one or more predictor variables. The analysis
method MARS, Multivariate Adaptive Regression Splines, (Friedman, 1991) offers us
the structure of a set of variables of an object as a linear combination equation to describe
a problem in terms of this equation, knowing their most influential variables. It is a non-
parametric regression technique. MARS is as an extension of linear models that
automatically models nonlinearities and interactions between variables. The analysis
determines the best possible variable to split the data into separate sets. The variable for
splitting is chosen based on maximizing the average “purity” of the two child nodes.
Each node is assigned a predicted outcome class. This process is repeated recursively
until it is impossible to continue. The result is the maximum sized tree which perfectly
fits to training data. The next step is to then prune the tree to create a generalized model
that will work with outside data sets. This pruning is performed by reducing the cost-
complexity of the tree while maximizing the prediction capability. An optimal tree is
selected which provides the best prediction capability on outside data sets and has the
least degree of complexity.
Models based on MARS have the following form:

( ) = + ∑ ℎ ( ) (3)
0

where hm(X) is a function from a set of candidate functions (and that can include products
of at least two or more of such functions). αm are the coefficients obtained by minimizing
residual sum of squares.
The process to build a tree using MARS is very straightforward. The procedure has to
calculate a set of candidate functions using reflected pairs of basis functions. In addition,
the number of constraints/restrictions must be specified and the degrees of interaction
allowed. A forward pass follows and new functions products are tried to see which ones
decreases the training error. After the forward pass, a backward pass is next. The
backward pass fix the overfit. Finally, generalized cross validation (GCV) is estimated in
order to find the optimal number of terms in the model. GCV is defined by:

328 329 330 331 332 333 334 335 336 337 338