Page 59 - Reclaim YOUR DIGITAL GOLD (with DesignLayout Dec3) (Clickable) (Dexxi-FLIP-Audio)_Neat
P. 59

DATA COLLECTION HARVESTING


            In addition, we’ll need to split the data into two parts.
            The first section, which will contain the majority of the
            data, will be used to train our model. The second section
            will be used to assess the performance of our trained
            model. Ourgoal is not to assess a model’s ability to learn
            from the data that trained it, just as you would not use
            the same questions from your homework exercises for
            the exam.

            There are times when the data we collect requires
            additional tweaking and processing. These include, but
            are not limited to, de-duplication, normalization, error
            correction, and other techniques. All of these events
            would occur during the data preparation process.
            We don’t need any additional data preparation in our
            situation, so let’s move on.



            THE MODEL SELECTION PROCEDURE


            The next step in our workflow is to choose a model.
            Researchers and data scientists have created a wide
            range of models over the course of their careers. Some
            are best suited for image data, others for sequences
            (such as text or music), and still others for numerical or
            text-based data. We can use a tiny linear model, which
            is reasonably simple and should work, because we only
            have two features, color and alcohol content.



            TRAINING


            We will now go over the training phase, which is widely
            regardedas the most time-consuming aspect of machine



                                                                    39
   54   55   56   57   58   59   60   61   62   63   64