Page 47 - Data Science Algorithms in a Week
P. 47
32 Edwin Cortes, Luis Rabelo and Gene Lee
The reconstruction values for ℎ are generated by applying equations 4 and 5 ,
or 7 for GRBM as explained by (Mohamed et al., 2012) in a Markov Chain using Gibbs
sampling. Post Gibbs sampling, the contrastive divergence-learning rule for an RBM can
be calculated and the weights of the neuron connections updated based on ∆ . The
literature also shows that RBM learning rule (equation 9) may be modified with constants
such as learning rate, weight-cost, momentum, and mini-batch sizes for a more precise
calculation of neuron weights during learning. Hinton et al. (2006) described that the
contrastive divergence learning in an RBM is efficient enough to be practical.
In RBM neuron learning, a gage of the error between visible unit probabilities and
their reconstruction probabilities computed after Gibbs sampling is accomplished by
cross-entropy. The cross-entropy, between the Bernoulli probability distributions of each
element of the visible units vdata and its reconstruction probabilities vrecon, is defined by
Erhan, Bengio, & Courville (2010) as follows:
= − ∑ [ log( ) + (1 − ) log( 1 − )] (10)
For the final DBN learning phase, after each stack of RBMs in the DBN pre-training
via greedy layer-wise unsupervised, the complete DBN is fine-tuned in a supervised way.
The supervised learning via the backpropagation algorithm uses label data (classification
data) to calculate neuron weights for the complete deep belief neural network. Hinton et
al. (2006) used the wake-sleep algorithm for fine-tuning a DBN. However, recent
research has demonstrated the backpropagation algorithm is faster and has lower
classification error (Wulsin et al., 2011). In backpropagation, the derivative of the log
probability distribution over class labels is propagated to fine-tune all neuron weights in
the lower levels of a DBN.
In summary, the Greedy Layer-Wise algorithm proposed by Hinton pre-trains the
DBN one layer at a time using contrastive divergence and Gibbs sampling, starting from
the bottom fist layer of visible variables to the top of the network – one RBM at a time
(Figure 5). After pre-train, the final DBN is fine-tuned in a top-down mode using several
algorithms such as the supervised backpropagation (Hinton & Salakhutdinov, 2006;
Larochelle et al., 2009) or the wake-sleep (Hinton et al., 2006; Bengio, 2009) – among
others.