Page 118 - Data Science Algorithms in a Week
P. 118
Clustering into K Clusters
155 46 Long Female
162 52 Long Female
166 55 Long Female
172 60 Long ?
To simplify the matters we will remove the column Hair length. We also remove the
column Gender since we would like to cluster the people in the table based on their height
and weight. We would like to find out whether the 11th person in the table is more likely to
be a man or a woman using clustering:
Height in cm Weight in kg
180 75
174 71
184 83
168 63
178 70
170 59
164 53
155 46
162 52
166 55
172 60
Analysis:
We may apply scaling to the initial data, but to simplify the matters, we will use the
unscaled data in the algorithm. We will cluster the data we have into the two clusters since
there are two possibilities for genders – a male or a female. Then we will aim to classify a
person with the height 172cm and weight 60kg to be more likely a man if and only if there
are more men in that cluster. The clustering algorithm is a very efficient technique. Thus
classifying this way is very fast, especially if there is a large number of the features to
classify.
[ 106 ]