Page 74 - Data Science Algorithms in a Week
P. 74
58 Olmer Garcia and Cesar Diaz
Finally, the parameter is the padding, which is responsible for the border of zeros
in the area that the filter sweeps.
Convolution Layer
The input layer is just the image and/or input data (e.g., 3D – height (N), width (N),
and depth (D)). Traditional Deep CNN uses the same height and width dimensions (i.e.,
squares). The convolution layer is next. The convolution layer is formed by filters (also
called kernels) which run over the input layer. A filter is of smaller sides (height (F) and
width (F)) than the previous layer (e.g., the inputs layer or a different one) but with the
same depth. A filter is used and processes the entire input layer producing part of the
output of the convolution layer (smaller than the previous layer). The process done by the
filter is executed by positioning the filter in successive areas (F by F) of the input layer.
This positioning advances in strides (S) which is the number of input neurons (of the area
– N x N)) to move in each step (i.e., strides are “the distance between the receptive field
centers of neighboring neurons in a kernel map” (Krizhevsky et al., 2012)). The
relationship of the input layer (or previous layer) (N x N x D) to the map produced by the
passing/execution of a filter of size (F x F x D) is:
Window size (e.g., number of neurons at that layer/level) = (N – F)/S + 1 (2)
However, a convolution layer can have several filters (e.g., kernels) in order to produce a
kernel map as output. It is easy to see that the size of the image is getting smaller. This
can be problematic in particular to apply large size filters or CNNs that have many layers
and filters. Then, the concept of padding (P) is used. Zero-padding is the addition of zero-
valued pixels in the borders of the input layers with strides of size P. The relationship is
as follows:
P = (F-1)/2 (3)
A convolution layer can have several filters each one of size (F x F x D) and this set
will produce an output in the convolutional layer of depth equal to the number of filters in
the respective layer. The output matrix (i.e., kernel map) of the convolutional layer is the
product of the different filters been run over the kernel map of the previous layer. The
kernel map of a convolution layer can be processed for successive convolution layers that
do not need to have filters of the same dimensional size or number. Again, these layers
must be engineered. The weights and biases of these filters to produce their respective
outputs can be obtained from different algorithms such as backpropagation.