Page 118 - Data Science Algorithms in a Week
P. 118

Clustering into K Clusters


             155           46            Long        Female

             162           52            Long        Female
             166           55            Long        Female
             172           60            Long        ?

            To simplify the matters we will remove the column Hair length. We also remove the
            column Gender since we would like to cluster the people in the table based on their height
            and weight. We would like to find out whether the 11th person in the table is more likely to
            be a man or a woman using clustering:

             Height in cm Weight in kg
             180           75
             174           71

             184           83
             168           63

             178           70
             170           59
             164           53

             155           46
             162           52
             166           55

             172           60
            Analysis:

            We may apply scaling to the initial data, but to simplify the matters, we will use the
            unscaled data in the algorithm. We will cluster the data we have into the two clusters since
            there are two possibilities for genders – a male or a female. Then we will aim to classify a
            person with the height 172cm and weight 60kg to be more likely a man if and only if there
            are more men in that cluster. The clustering algorithm is a very efficient technique. Thus
            classifying this way is very fast, especially if there is a large number of the features to
            classify.






                                                    [ 106 ]
   113   114   115   116   117   118   119   120   121   122   123