Page 59 - Reclaim YOUR DIGITAL GOLD (without audio)

P. 59

Data ColleCtion Harvesting

In addition, we’ll need to split the data into two parts.
The first section, which will contain the majority of the
data, will be used to train our model. The second section
will be used to assess the performance of our trained
model. Our goal is not to assess a model’s ability to learn
from the data that trained it, just as you would not use
the same questions from your homework exercises for
the exam.

There are times when the data we collect requires
additional tweaking and processing. These include, but
are not limited to, de-duplication, normalization, error
correction, and other techniques. All of these events
would occur during the data preparation process.
We don’t need any additional data preparation in our
situation, so let’s move on.

THE MODEL SELECTION PROCEDURE

The next step in our workflow is to choose a model.
Researchers and data scientists have created a wide
range of models over the course of their careers. Some
are best suited for image data, others for sequences
(such as text or music), and still others for numerical or
text-based data. We can use a tiny linear model, which
is reasonably simple and should work, because we only
have two features, color and alcohol content.

TRAINING

We will now go over the training phase, which is widely
regarded as the most time-consuming aspect of machine

54 55 56 57 58 59 60 61 62 63 64