Page 125 - Data Science Algorithms in a Week
P. 125
Clustering into K Clusters
House ownership – choosing the number of
clusters
Let us take the example from the first chapter about the house ownership.
Age Annual income in USD House ownership status
23 50000 non-owner
37 34000 non-owner
48 40000 owner
52 30000 non-owner
28 95000 owner
25 78000 non-owner
35 130000 owner
32 105000 owner
20 100000 non-owner
40 60000 owner
50 80000 Peter
We would like to predict if Peter is a house owner using clustering.
Analysis:
Just as in the first chapter, we will have to scale the data since the income axis is by orders
of magnitude greater and thus would diminish the impact of the age axis which actually
has a good predictive power in this kind of problem. This is because it is expected that older
people have had more time to settle down, save money and buy a house than the younger
ones.
We apply the same rescaling from the Chapter 1 and get the following table:
Age Scaled age Annual income in USD Scaled annual income House ownership status
23 0.09375 50000 0.2 non-owner
37 0.53125 34000 0.04 non-owner
48 0.875 40000 0.1 owner
[ 113 ]