Page 32 - Data Science Algorithms in a Week
P. 32
Classification Using K Nearest Neighbors
After scaling, we get the following data:
Age Scaled age Annual income in USD Scaled annual House ownership status
income
23 0.09375 50,000 0.2 Non-owner
37 0.53125 34,000 0.04 Non-owner
48 0.875 40,000 0.1 Owner
52 1 30,000 0 Non-owner
28 0.25 95,000 0.65 Owner
25 0.15625 78,000 0.48 Non-owner
35 0.46875 130,000 1 Owner
32 0.375 105,000 0.75 Owner
20 0 100,000 0.7 Non-owner
40 0.625 60,000 0.3 Owner
50 0.9375 80,000 0.5 ?
Now, if we apply the 1-NN algorithm with the Euclidean metric, we will find out that Peter
more than likely owns a house. Note that, without rescaling, the algorithm would yield a
different result. Refer to exercise 1.5.
Text classification - using non-Euclidean
distances
We are given the word counts of the keywords algorithm and computer for documents of
the classes, informatics and mathematics:
Algorithm words per 1,000 Computer words per 1,000 Subject classification
153 150 Informatics
105 97 Informatics
75 125 Informatics
81 84 Informatics
[ 20 ]