Page 125 - Data Science Algorithms in a Week

P. 125

Clustering into K Clusters

House ownership – choosing the number of

clusters

Let us take the example from the first chapter about the house ownership.

Age Annual income in USD House ownership status

23 50000 non-owner
37 34000 non-owner
48 40000 owner

52 30000 non-owner
28 95000 owner
25 78000 non-owner

35 130000 owner
32 105000 owner
20 100000 non-owner

40 60000 owner
50 80000 Peter
We would like to predict if Peter is a house owner using clustering.

Analysis:

Just as in the first chapter, we will have to scale the data since the income axis is by orders
of magnitude greater and thus would diminish the impact of the age axis which actually
has a good predictive power in this kind of problem. This is because it is expected that older
people have had more time to settle down, save money and buy a house than the younger
ones.

We apply the same rescaling from the Chapter 1 and get the following table:

Age Scaled age Annual income in USD Scaled annual income House ownership status
23 0.09375 50000 0.2 non-owner
37 0.53125 34000 0.04 non-owner

48 0.875 40000 0.1 owner

[ 113 ]

120 121 122 123 124 125 126 127 128 129 130