Page 47 - Data Science Algorithms in a Week
P. 47

32                   Edwin Cortes, Luis Rabelo and Gene Lee

                          The reconstruction values for            ℎ are generated by applying equations 4 and 5 ,
                                                        
                                                               
                       or 7 for GRBM as explained by (Mohamed et al., 2012) in a Markov Chain using Gibbs
                       sampling. Post Gibbs sampling, the contrastive divergence-learning rule for an RBM can
                       be  calculated  and  the  weights  of  the  neuron  connections  updated  based  on  ∆  .  The
                       literature also shows that RBM learning rule (equation 9) may be modified with constants
                       such as learning rate, weight-cost, momentum, and mini-batch sizes for a more precise
                       calculation  of  neuron  weights  during  learning.  Hinton  et  al.  (2006)  described  that  the
                       contrastive divergence learning in an RBM is efficient enough to be practical.
                          In RBM neuron learning, a gage of the error between visible unit probabilities and
                       their  reconstruction  probabilities  computed  after  Gibbs  sampling  is  accomplished  by
                       cross-entropy. The cross-entropy, between the Bernoulli probability distributions of each
                       element of the visible units vdata and its reconstruction probabilities vrecon, is defined by
                       Erhan, Bengio, & Courville (2010) as follows:


                                          = − ∑ [                log(                 ) + (1 −                ) log( 1 −                  )]                               (10)
                                         

                          For the final DBN learning phase, after each stack of RBMs in the DBN pre-training
                       via greedy layer-wise unsupervised, the complete DBN is fine-tuned in a supervised way.
                       The supervised learning via the backpropagation algorithm uses label data (classification
                       data) to calculate neuron weights for the complete deep belief neural network. Hinton et
                       al.  (2006)  used  the  wake-sleep  algorithm  for  fine-tuning  a  DBN.  However,  recent
                       research  has  demonstrated  the  backpropagation  algorithm  is  faster  and  has  lower
                       classification error (Wulsin et al., 2011). In backpropagation, the derivative of the log
                       probability distribution over class labels is propagated to fine-tune all neuron weights in
                       the lower levels of a DBN.
                          In  summary,  the  Greedy  Layer-Wise  algorithm  proposed  by  Hinton  pre-trains  the
                       DBN one layer at a time using contrastive divergence and Gibbs sampling, starting from
                       the bottom fist layer of visible variables to the top of the network – one RBM at a time
                       (Figure 5). After pre-train, the final DBN is fine-tuned in a top-down mode using several
                       algorithms  such  as  the  supervised  backpropagation  (Hinton  &  Salakhutdinov,  2006;
                       Larochelle et al., 2009) or the wake-sleep (Hinton et al., 2006; Bengio, 2009) – among
                       others.
   42   43   44   45   46   47   48   49   50   51   52