Page 80 - Data Science Algorithms in a Week

P. 80

64 Olmer Garcia and Cesar Diaz

Our work is inspired by the German Traffic Signs data set provided by Stallkamp,
Schlipsing, Salmen, & Igel (2011) that contained about 40k training examples and 12k
testing examples. The same problem can be used as a model for Colombia traffic signs.
This is a classification problem which aims to assign the right class to a new image of a
traffic sign by training on the provided pairs of traffic sign images and their labels. The
project can be broken down into five parts: exploratory data analysis, data preprocessing
and data augmentation, the definition of a CNN architecture, training the model, testing
the model and using it with other images.

Data Analysis

The database is a set of images which can be described computationally like a
dictionary with key/value pairs:

 The image data set is a 4D array containing raw pixel data of the traffic sign
images (number of examples, width, height, channels).
 The label is an array containing the type of the traffic sign (number of samples,
traffic sign id).
 Traffic sign id description is a file, which contains the name and some
description for each traffic sign id.
 An array containing tuples, (x1, y1, x2, y2) representing coordinates of a
bounding box around the sign in the image.

It is essential to understand the data and how to manipulate it (Figure 5 shows some
randomly selected samples). This process of understanding and observing the data can
generate important conclusions such as:

 Single-image, multi-class classification problem.
 Forty-three classes of a traffic sign.
 Reliable ground-truth data due to semi-automatic annotation (Stallkamp,
Schlipsing, Salmen, & Igel, 2011).
 The images contain one traffic sign each
 Images are not necessarily squared; they contain a border of 10% around the
traffic sign and is not centered in the image.
 Image sizes vary between 15x15 to 250x250 pixels
 The classes were found to be highly imbalanced.

75 76 77 78 79 80 81 82 83 84 85