Page 125 - Data Science Algorithms in a Week
P. 125

Clustering into K Clusters


            House ownership – choosing the number of

            clusters

            Let us take the example from the first chapter about the house ownership.

             Age Annual income in USD House ownership status

             23   50000                  non-owner
             37   34000                  non-owner
             48   40000                  owner

             52   30000                  non-owner
             28   95000                  owner
             25   78000                  non-owner

             35   130000                 owner
             32   105000                 owner
             20   100000                 non-owner

             40   60000                  owner
             50   80000                  Peter
            We would like to predict if Peter is a house owner using clustering.

            Analysis:

            Just as in the first chapter, we will have to scale the data since the income axis is by orders
            of magnitude greater and thus would diminish the impact of the age axis which actually
            has a good predictive power in this kind of problem. This is because it is expected that older
            people have had more time to settle down, save money and buy a house than the younger
            ones.

            We apply the same rescaling from the Chapter 1 and get the following table:

             Age Scaled age Annual income in USD Scaled annual income House ownership status
             23   0.09375    50000                  0.2                  non-owner
             37   0.53125    34000                  0.04                 non-owner

             48   0.875      40000                  0.1                  owner



                                                    [ 113 ]
   120   121   122   123   124   125   126   127   128   129   130