Page 58 - Reclaim YOUR DIGITAL GOLD (without audio)

P. 58

RECLAIM YOUR DIGITAL GOLD

Figure 1 shows the color, the percentage of alcohol, and
whether the beverage is beer or wine. These will be the
basis of our ML training data.

DATA PREPARATION

Now that we’ve gathered all of our training data, it is now
time to progress to the next stage of machine learning,
known as “Data Preparation.” During this stage, we will
load our data into the appropriate setting and prepare it
for use in our machine learning training.

We’ll start by combining all of our data, and then
we’ll choose the order of appearance at random. We
don’t want the order in which our data is presented to
influence what we discover because that isn’t a factor in
determining whether a beverage is beer or wine. To put
it another way, when determining the characteristics of a
beverage, we take neither its immediate predecessor nor
its immediate successor into account.

So, let’s run any relevant visualizations of your data to
see if there are any important links between different
factors that you can use to your advantage, as well as if
there are any imbalances in the data. For example, if we
collected far more data points about beer than wine, the
model we train will be predisposed to guess that almost
everything it sees is beer because it will be correct the
majority of the time. On the other hand, the model could
be exposed to an equal amount of beer and wine in the
real world, which would mean that guessing “beer” would
be incorrect 50% of the time.

53 54 55 56 57 58 59 60 61 62 63