Page 125 - Data Science Algorithms in a Week
        P. 125
     Clustering into K Clusters
            House ownership – choosing the number of
            clusters
            Let us take the example from the first chapter about the house ownership.
             Age Annual income in USD House ownership status
             23   50000                  non-owner
             37   34000                  non-owner
             48   40000                  owner
             52   30000                  non-owner
             28   95000                  owner
             25   78000                  non-owner
             35   130000                 owner
             32   105000                 owner
             20   100000                 non-owner
             40   60000                  owner
             50   80000                  Peter
            We would like to predict if Peter is a house owner using clustering.
            Analysis:
            Just as in the first chapter, we will have to scale the data since the income axis is by orders
            of magnitude greater and thus would diminish the impact of the age axis which actually
            has a good predictive power in this kind of problem. This is because it is expected that older
            people have had more time to settle down, save money and buy a house than the younger
            ones.
            We apply the same rescaling from the Chapter 1 and get the following table:
             Age Scaled age Annual income in USD Scaled annual income House ownership status
             23   0.09375    50000                  0.2                  non-owner
             37   0.53125    34000                  0.04                 non-owner
             48   0.875      40000                  0.1                  owner
                                                    [ 113 ]





