Page 41 - AI & Machine Learning for Beginners: A Guided Workbook

P. 41

Sample Data Table:

• Color (let’s Percentage of Label
say in hex
• code) alcohol (wine or beer)
610 5 Beer
599 13 Wine

693 14 Wine

These values form our training data by correlating each beverage’s
features (color and alcohol) with its correct label.

3. Data Preparation

• Combine and Shuffle: Merge all collected data,
randomizing the order to avoid bias from the data
sequence.
• Visualize Your Data: Check for correlations between
features and any imbalances (e.g., too many beer samples
relative to wine).
• Split the Data:
o Training Set: The Majority of the data to build the
model.
o Test Set: A smaller portion (typically an 80/20 or
70/30 split) to evaluate the model’s performance.
• Additional Cleaning: If necessary, perform de-duplication,
normalization, and error correction to ensure data quality.

4. Model Selection

• Choice of Model: For simplicity, we opt for a linear model
because we only need to separate the beverages based on
two features.
• Linear Model Overview: The model is represented by a
simple linear equation:

y = m*x + b
39

36 37 38 39 40 41 42 43 44 45 46