Page 83 - Data Science Algorithms in a Week
P. 83
Machine Learning Applied to Autonomous Vehicles 67
Layer 1: Convolutional. Input = 32x32x1. Output = 28x28x6. Activation
function ReLU.
Layer 2: Sub-sampling Max-Pooling. Input = 28x28x6. Output = 14x14x6.
Layer 3: Convolutional. Input = 14x14x6. Output = 10x10x16. Activation
function ReLU.
Layer 4: Sub-sampling Max-Pooling. Input = 10x10x16. Output = 5x5x16.
Layer 5: Flat layer, 3-D to 1D. Input = 5x5x16. Output = 400.
Layer 6: Fully connected layer. Input = 400. Output = 120. Activation function
ReLU.
Layer 7: Fully connected layer. Input = 120. Output = 84. Activation function
ReLU.
Layer 8: Output layer. Input = 84. Output = 10. Apply the soft-Max function to
obtain the output. The output is 10 indicating the different digits from 0 to 9.
It is possible to modify LeNet-5 to accommodate the requirements of our problem.
We can start by changing the input and re-define the size of the images. For example, a
square of 32 pixels with three channels (RGB) can be used as layer 1 (i.e, the Input of
Layer 1 is 32x32x3) and the outputs (i.e., the number of classes) which in our
implementation of traffic signals are set to 43 (i.e., Output of Layer 8 is 43). After
training and validating, one can start changing parts of the architecture or trying new ones
based on the training criteria. This will become an iterative process where one knows
which parameters and layers should be changed. One important question is how to obtain
the initial values of the weights. This could be done by selecting values from a normal
distribution, but if the analyst sees that after training, the values of the parameters are
very small or very large, he/she can change the variance of the distribution.
Figure 7. The architecture of LeNet-5, a Convolutional Neural Network, here for digits’ recognition.
Each plane is a feature map, i.e., a set of units whose weights are constrained to be identical – Adapted
and modified from LeCun et al. (1998).