Page 84 - Data Science Algorithms in a Week
P. 84

68                        Olmer Garcia and Cesar Diaz

                      Training the Model

                          There  are  several  platforms  to  implement  the  training  process  from  an
                       algorithmic/software/hardware viewpoint. One of the most used platforms is TensorFlow
                       like  backend  (https://www.tensorflow.org/).  TensorFlow  is  an  open  source  software
                       library for AI which performs mathematical operations in an efficient way. TensorFlow
                       achieves this by:

                            Managing derivatives computing processes automatically.
                            Including  a  computing  architecture  that  supports  asynchronous  computation,
                              queues, and threads in order to avoid long training sessions.

                          The training process for CNNs has the following steps:

                            Split the training data between training and validation. Validation data is used
                              for calculating the accuracy of the estimation. On the other hand, training data is
                              used to apply the gradient algorithm.
                            Type  of  optimizer:  Several  algorithms  can  be  used.  The  gradient  descent
                              stochastic  optimization  by  Kingma  &  Ba  (2014)  is  a  typical  selection.  This
                              scheme  is  a  first-order  gradient-based  optimization  of  stochastic  objective
                              functions. In addition, it is well suited for problems that are large in terms of data
                              and/or  input  parameters.  The  algorithm  is  simple  and  can  be  modified
                              accordingly.  Kingma  and  Ba  (2014)  detailed  their  algorithm  (pseudocode)  as
                              follows:
                              Require: α: Stepsize
                              Require: β1, β2 ∈ [0, 1): Exponential decay rates for the moment estimates
                              Require: f(θ): Stochastic objective function with parameters θ
                              Require: θ0: Initial parameter vector
                                                   st
                                 m0 ← 0 (Initialize 1 -moment vector)
                                                  nd
                                 v0 ← 0 (Initialize 2 -moment vector)
                                 t ← 0 (Initialize timestep)
                                 while θt not converged do
                                     t ← t + 1 (Increase timestep t)
                                     gt ← ∇θft(θt−1) (Get gradients with respect to the stochastic objective at t)
                                     mt ← β1 · mt−1 + (1 − β1) · gt (Update biased first-moment estimate)
                                                                  2
                                     vt  ←  β2  ·  vt−1  +  (1  −  β2)  ·     (Update  biased  second  raw  moment
                                                                    
                                         estimate)
                                                     
                                     ̂
                                         ← mt/(1 −    ) (Compute bias-corrected first moment estimate)
                                         
                                                   1
                                                   
                                       ̂t ← vt/(1 − β ) (Compute bias-corrected second raw moment estimate)
                                                 2
   79   80   81   82   83   84   85   86   87   88   89