Page 41 - AI & Machine Learning for Beginners: A Guided Workbook
P. 41

Sample Data Table:


                •  Color (let’s       Percentage of          Label
                    say in hex
                •  code)                 alcohol        (wine or beer)
                      610                   5                Beer
                      599                   13               Wine

                      693                   14               Wine

         These values form our training data by correlating each beverage’s
         features (color and alcohol) with its correct label.


         3. Data Preparation

             •  Combine and Shuffle: Merge all collected data,
                randomizing the order to avoid bias from the data
                sequence.
             •  Visualize Your Data: Check for correlations between
                features and any imbalances (e.g., too many beer samples
                relative to wine).
             •  Split the Data:
                    o  Training Set: The Majority of the data to build the
                        model.
                    o  Test Set: A smaller portion (typically an 80/20 or
                        70/30 split) to evaluate the model’s performance.
             •  Additional Cleaning: If necessary, perform de-duplication,
                normalization, and error correction to ensure data quality.


         4. Model Selection

             •  Choice of Model: For simplicity, we opt for a linear model
                because we only need to separate the beverages based on
                two features.
             •  Linear Model Overview: The model is represented by a
                simple linear equation:

                y = m*x + b
                                        39
   36   37   38   39   40   41   42   43   44   45   46