Page 72 - Data Science Algorithms in a Week
P. 72

56                        Olmer Garcia and Cesar Diaz

                          Where  xi  is  the  value  of  each  input  to  the  node,  wi  are  weight  parameters  which
                       multiply each input, b is known as the bias parameter and f (.) is known as the activation
                       function.  The  commonly  used  functions  are  the  sigmoidal  activation  functions,  the
                       hyperbolic tangent functions and the rectified linear unit (ReLU). Heaton (2015) proposes
                       that  while  most  current  literature in  deep learning  suggests using  the  ReLU  activation
                       function exclusively, it is necessary to understand sigmoidal and hyperbolic tangent to
                       see the benefits of ReLU.
                          Varying the weights and the bias would vary the amount of influence any given input
                       has on the output. The learning aspect of neural networks takes place during a process
                       known  as  back-propagation  used  by  the  most  common  algorithm  developed  in  the
                       1980’s. In the learning process, the network modifies the weights and bias to improve the
                       network’s  output  like  any  algorithm  of  machine  learning.  Backpropagation  is  an
                       optimization process which uses the chain rule of the derivative to minimize the error in
                       order to improve the output accuracy. This process is developed by numerical methods
                       where stochastic gradient descent (SGD) is a dominant scheme.
                          Finally, the way in which nodes are connected defines the architecture of the neural
                       network. Some of the popularly known algorithms are as follows:

                            Self-organizing  maps  (Kohonen,  1998):  Unsupervised  learning  algorithm  used
                              for  clustering  problems,  used  principally  to  understand  some  information  of
                              perception problems.
                            Feedforward  artificial  neural  networks  (Widrow  &  Lehr,  1990):  Supervised
                              learning  algorithm  that  is  used  for  classification  and  regression.  It  has  been
                              applied  to  robotics  and  vision  problems.  This  architecture  is  very  common  in
                              traditional  Neural  Networks  (NNs)  and  was  heavily  used  in  the  multilayer
                              Perceptron. They can be used as universal function regressors.
                            Boltzmann machines (Hinton, Sejnowski, & Ackley, 1984): Supervised learning
                              algorithm that is used for classification and optimization problems. A Boltzmann
                              machine is essentially a fully connected two-layer neural network.
                            Hopfield  neural  networks  (Hopfield,  1982):  Supervised  learning  algorithm  is
                              used for classification and optimization problems. It is a fully connected single
                              layer,  auto  associative  network.  It  works  well  for  incomplete  or  distorted
                              patterns, and they can be used for optimization problems such as the traveling
                              salesman problem.
                            Convolutional neural networks (CNNs): Although Fukushima (1980) introduced
                              the concepts of CNN, many authors have worked on CNN. LeCun et al. (1998)
                              developed  a neural  network  architecture:  LeNet-5.  LeNet-5  has  become  of  the
                              most accepted architectures. A CNN is a supervised learning algorithm. CNN's
                              map their input into 2D grids. CNN have taken image and recognition to a higher
                              level  of  capability.  This  advance  in  CNN's  is  due  to  years  of  research  on
   67   68   69   70   71   72   73   74   75   76   77