Page 82 - Data Science Algorithms in a Week
P. 82
66 Olmer Garcia and Cesar Diaz
Pre-Processing and Data Augmentation
The input images to the neural network went through a few preprocessing steps to
help train the network. Pre-processing can include:
Resizing the image: A specific size is required. 32x32 is a good value based on
the literature.
Color Space Conversion: It is possible to transform to gray scale if you think that
the colors do not matter in the classification or may be changed from RGB (Red,
Green, and Blue) space to some color space like HSV (Hue, Saturation, and
Brightness). Some other approach can include balanced brightness and contrast
of the images.
Normalization: This part is very important because the algorithms in neural
networks work just with the data in some interval, normally between 0 and 1 or -
1 and 1. This could be done by dividing each dimension by its standard deviation
once it is zero-centered. This process causes each feature to have a similar range
so that our gradients do not go out of control (Heaton, 2013).
Unbalanced data, as shown in Figure 6, means that there are many more samples of
one traffic sign than the others. This could generate overfitting and/or other problems in
the learning process. One solution is to generate new images or to take some images
randomly and change through a random combination of the following techniques:
Translation: Move the image horizontally or vertically and some pixels around
the center of the image.
Rotation: Rotate the image at random angle with axes at the center of the image.
Affine transformations: Make a zoom over the image or change the perspective
of the image.
Definition of an Initial CNN Architecture
A good way to start assembling your own deep neural network is to review the
literature and look for a deep learning architecture which has been used in a similar
problem. The first one was the architecture presented by LeCun et al. (1998): LeNet-5
(Figure 7). Let’s assume that we select LeNet-5. Therefore, the first step is to understand
LeNet-5 which is composed of 8 layers. LeNet-5 is explained as follows: