Page 84 - Data Science Algorithms in a Week
P. 84
68 Olmer Garcia and Cesar Diaz
Training the Model
There are several platforms to implement the training process from an
algorithmic/software/hardware viewpoint. One of the most used platforms is TensorFlow
like backend (https://www.tensorflow.org/). TensorFlow is an open source software
library for AI which performs mathematical operations in an efficient way. TensorFlow
achieves this by:
Managing derivatives computing processes automatically.
Including a computing architecture that supports asynchronous computation,
queues, and threads in order to avoid long training sessions.
The training process for CNNs has the following steps:
Split the training data between training and validation. Validation data is used
for calculating the accuracy of the estimation. On the other hand, training data is
used to apply the gradient algorithm.
Type of optimizer: Several algorithms can be used. The gradient descent
stochastic optimization by Kingma & Ba (2014) is a typical selection. This
scheme is a first-order gradient-based optimization of stochastic objective
functions. In addition, it is well suited for problems that are large in terms of data
and/or input parameters. The algorithm is simple and can be modified
accordingly. Kingma and Ba (2014) detailed their algorithm (pseudocode) as
follows:
Require: α: Stepsize
Require: β1, β2 ∈ [0, 1): Exponential decay rates for the moment estimates
Require: f(θ): Stochastic objective function with parameters θ
Require: θ0: Initial parameter vector
st
m0 ← 0 (Initialize 1 -moment vector)
nd
v0 ← 0 (Initialize 2 -moment vector)
t ← 0 (Initialize timestep)
while θt not converged do
t ← t + 1 (Increase timestep t)
gt ← ∇θft(θt−1) (Get gradients with respect to the stochastic objective at t)
mt ← β1 · mt−1 + (1 − β1) · gt (Update biased first-moment estimate)
2
vt ← β2 · vt−1 + (1 − β2) · (Update biased second raw moment
estimate)
̂
← mt/(1 − ) (Compute bias-corrected first moment estimate)
1
̂t ← vt/(1 − β ) (Compute bias-corrected second raw moment estimate)
2