Page 120 - Data Science Algorithms in a Week
P. 120
Clustering into K Clusters
The red cluster with the features (155,46), (164,53), (162,52), (166,55) will have the centroid
((155+164+162+166)/4,(46+53+52+55)/4)=(161.75, 51.5).
Reclassifying the points using the new centroid, the classes of the points do not change. The
blue cluster will have the points (180,75), (174,71), (184,83), (168,63), (178,70), (170,59),
(172,60). The red cluster will have the points (155,46), (164,53), (162,52), (166,55). Therefore
the clustering algorithm terminates with clusters as displayed in the following image 5.2:
Image 5.2: Clustering of people by their height and weight
Now we would like to classify the instance (172,60) as to whether it is a male or a female.
The instance (172,60) is in the blue cluster. So it is similar to the features in the blue cluster.
Are the remaining features in the blue cluster more likely males or females? 5 out of 6
features are males, only 1 is a female. Since the majority of the features are males in the blue
cluster and the person (172,60) is in the blue cluster as well, we classify the person with the
height 172cm and the weight 60kg as a male.
[ 108 ]