Page 117 - Data Science Algorithms in a Week
P. 117

Clustering into K Clusters


            The centroid of the second cluster is (1/3)*(115k+130k+135k)=(1/3)*380k~126.66k.

            Using the new centroids we reclassify the features as follows:
                      The first cluster with the centroid 66.25k will contain the features 40k, 55k, 70k.
                      The second cluster with the centroid 126.66k will contain the features 100k, 115k,
                      130k, 135k.

            We notice that the feature 100k moved from the first cluster into the second since now it is
            closer to the centroid of the second cluster (distance |100k-126.66k|=26.66k) than to the
            centroid of the first cluster (distance |100k-66.25k|=33.75k). Since the features in the clusters
            changed, we have to recompute the centroids again.

            The centroid of the first cluster is (1/3)*(40k+55k+70k)=(1/3)/165k=55k. The centroid of the
            second cluster is (1/4)*(100k+115k+130k+135k)=(1/4)*480k=120k.
            Using these centroids we reclassify the items into the clusters. The first centroid 55k will
            contain the features 40k, 55k, 70k. The second centroid 120k will contain the features 100k,
            115k, 130k, 135k. Thus upon the update of the centroids, the clusters did not change. So
            their centroids will remain the same.

            Therefore the algorithm terminates with the two clusters: the first cluster having the
            features 40k, 55k, 70k; the second cluster having the features 100k, 115k, 130k, 135k.



            Gender classification - clustering to classify


            We take the data from the gender classification in the problem Chapter 2, Naive Bayes,
            Analysis point 6:

             Height in cm Weight in kg Hair length Gender

             180           75            Short       Male
             174           71            Short       Male

             184           83            Short       Male
             168           63            Short       Male
             178           70            Long        Male

             170           59            Long        Female
             164           53            Short       Female



                                                    [ 105 ]
   112   113   114   115   116   117   118   119   120   121   122