Page 30 - Data Science Algorithms in a Week
P. 30
Classification Using K Nearest Neighbors
k % of incorrectly classified points
1 2.97
3 3.24
5 3.29
7 3.40
9 3.57
Thus, for this particular type of classification problem, the k-NN algorithm achieves the
highest accuracy (least error rate) for k=1.
However, in real-life, problems we wouldn't usually not have complete data or a solution.
In such scenarios, we need to choose k appropriate to the partially available data. For this,
consult problem 1.4.
House ownership - data rescaling
For each person, we are given their age, yearly income, and whether their is a house or not:
Age Annual income in USD House ownership status
23 50,000 Non-owner
37 34,000 Non-owner
48 40,000 Owner
52 30,000 Non-owner
28 95,000 Owner
25 78,000 Non-owner
35 130,000 Owner
32 105,000 Owner
20 100,000 Non-owner
40 60,000 Owner
50 80,000 Peter
[ 18 ]