Page 75 - Data Science Algorithms in a Week

P. 75

Machine Learning Applied to Autonomous Vehicles 59

Knowing the dimensionality of each additional layer helps us understand how large
our model is and how our decisions around filter size and stride affect the size of our
network. With these parameters, we can calculate the number of neurons of each layer in
CNN, given an input layer that has a volume of W (as given by N x N x D), the filter has
a volume (F ∗ F ∗ D) of F, we have a stride of S, and a padding of P, then the following
formula gives us the volume of the next layer:

Volume of next layer: (W - F + 2P)/S + 1. (4)

Pooling Layer

This layer can have several types of filters. One of the most common ones is Max
pooling. Max pooling is a filter of a width by height, which extracts the maximum value
of the patch. Conceptually, the benefit of the max pooling operation is to reduce the size
of the input and to allow the neural network to focus on only the most important
elements. Max pooling does this by only retaining the maximum value for each filtered
area, and removing the remaining values. This technique can avoid over fitting
(Krizhevsky et al., 2012). Some variations like mean pooling are also used.

Fully Connected Layer(s)

This layer type flattens the nodes in one dimension. A fully connected layer connects
every element (neuron) in the previous layers, note that the resulting vector is passed
through an activation function. For example, LeNET-5 networks will typically contain
several dense layers as their final layers. The final dense layer in a LeNET5 actually
performs the classification. There should be one output neuron for each class or type of
image to classify.

Dropout Layer

Normally deep learning has many nodes which mean many parameters. This number
of nodes can generate overfitting. Therefore dropout is used as a regularization technique
for reducing overfitting (Srivastava, Hinton, Krizhevsky, Sutskever, & Salakhutdinov,
2014). This layer “drops out” a random set of activations in that layer by setting them to
zero in the forward pass. During training, a good starting value for the probability to
dropout is 0.5 and during testing, it uses a value of 1.0 to keep all units and maximizes
the generalization power of the model. There are some variations about this. Krizhevsky

70 71 72 73 74 75 76 77 78 79 80