Page 117 - Data Science Algorithms in a Week
P. 117
Clustering into K Clusters
The centroid of the second cluster is (1/3)*(115k+130k+135k)=(1/3)*380k~126.66k.
Using the new centroids we reclassify the features as follows:
The first cluster with the centroid 66.25k will contain the features 40k, 55k, 70k.
The second cluster with the centroid 126.66k will contain the features 100k, 115k,
130k, 135k.
We notice that the feature 100k moved from the first cluster into the second since now it is
closer to the centroid of the second cluster (distance |100k-126.66k|=26.66k) than to the
centroid of the first cluster (distance |100k-66.25k|=33.75k). Since the features in the clusters
changed, we have to recompute the centroids again.
The centroid of the first cluster is (1/3)*(40k+55k+70k)=(1/3)/165k=55k. The centroid of the
second cluster is (1/4)*(100k+115k+130k+135k)=(1/4)*480k=120k.
Using these centroids we reclassify the items into the clusters. The first centroid 55k will
contain the features 40k, 55k, 70k. The second centroid 120k will contain the features 100k,
115k, 130k, 135k. Thus upon the update of the centroids, the clusters did not change. So
their centroids will remain the same.
Therefore the algorithm terminates with the two clusters: the first cluster having the
features 40k, 55k, 70k; the second cluster having the features 100k, 115k, 130k, 135k.
Gender classification - clustering to classify
We take the data from the gender classification in the problem Chapter 2, Naive Bayes,
Analysis point 6:
Height in cm Weight in kg Hair length Gender
180 75 Short Male
174 71 Short Male
184 83 Short Male
168 63 Short Male
178 70 Long Male
170 59 Long Female
164 53 Short Female
[ 105 ]