Page 74 - Data Science Algorithms in a Week
P. 74

58                        Olmer Garcia and Cesar Diaz

                              Finally, the parameter is the padding, which is responsible for the border of zeros
                              in the area that the filter sweeps.


                       Convolution Layer

                          The input layer is just the image and/or input data (e.g., 3D – height (N), width (N),
                       and depth (D)). Traditional Deep CNN uses the same height and width dimensions (i.e.,
                       squares). The convolution layer is next. The convolution layer is formed by filters (also
                       called kernels) which run over the input layer. A filter is of smaller sides (height (F) and
                       width (F)) than the previous layer (e.g., the inputs layer or a different one) but with the
                       same depth. A filter is used and processes the entire input layer producing part of the
                       output of the convolution layer (smaller than the previous layer). The process done by the
                       filter is executed by positioning the filter in successive areas (F by F) of the input layer.
                       This positioning advances in strides (S) which is the number of input neurons (of the area
                       – N x N)) to move in each step (i.e., strides are “the distance between the receptive field
                       centers  of  neighboring  neurons  in  a  kernel  map”  (Krizhevsky  et  al.,  2012)).  The
                       relationship of the input layer (or previous layer) (N x N x D) to the map produced by the
                       passing/execution of a filter of size (F x F x D) is:

                          Window size (e.g., number of neurons at that layer/level) = (N – F)/S  + 1   (2)

                       However, a convolution layer can have several filters (e.g., kernels) in order to produce a
                       kernel map as output. It is easy to see that the size of the image is getting smaller. This
                       can be problematic in particular to apply large size filters or CNNs that have many layers
                       and filters. Then, the concept of padding (P) is used. Zero-padding is the addition of zero-
                       valued pixels in the borders of the input layers with strides of size P. The relationship is
                       as follows:

                          P = (F-1)/2                                                              (3)

                          A convolution layer can have several filters each one of size (F x F x D) and this set
                       will produce an output in the convolutional layer of depth equal to the number of filters in
                       the respective layer. The output matrix (i.e., kernel map) of the convolutional layer is the
                       product of the different filters been run over the kernel map of the previous layer. The
                       kernel map of a convolution layer can be processed for successive convolution layers that
                       do not need to have filters of the same dimensional size or number. Again, these layers
                       must be engineered. The weights and biases of these filters to produce their respective
                       outputs can be obtained from different algorithms such as backpropagation.
   69   70   71   72   73   74   75   76   77   78   79