Page 59 - Reclaim YOUR DIGITAL GOLD (without audio)
P. 59

Data ColleCtion Harvesting



            In addition, we’ll need to split the data into two parts.
            The first section, which will contain the majority of the
            data, will be used to train our model. The second section
            will be used to assess the performance of our trained
            model. Our goal is not to assess a model’s ability to learn
            from the data that trained it, just as you would not use
            the same questions from your homework exercises for
            the exam.

            There  are  times  when  the  data  we  collect  requires
            additional tweaking and processing. These include, but
            are  not  limited  to, de-duplication,  normalization,  error
            correction, and other techniques. All of these  events
            would  occur  during the  data  preparation  process.
            We  don’t  need  any additional  data  preparation  in  our
            situation, so let’s move on.



            THE MODEL SELECTION PROCEDURE


            The  next  step  in  our  workflow  is  to  choose  a  model.
            Researchers and  data  scientists  have created  a wide
            range of models over the course of their careers. Some
            are  best  suited  for image  data,  others  for sequences
            (such as text or music), and still others for numerical or
            text-based data. We can use a tiny linear model, which
            is reasonably simple and should work, because we only
            have two features, color and alcohol content.



            TRAINING


            We will now go over the training phase, which is widely
            regarded as the most time-consuming aspect of machine



                                                                    39
   54   55   56   57   58   59   60   61   62   63   64