Page 30 - Data Science Algorithms in a Week
P. 30

Classification Using K Nearest Neighbors


             k % of incorrectly classified points

             1 2.97
             3 3.24
             5 3.29

             7 3.40
             9 3.57

            Thus, for this particular type of classification problem, the k-NN algorithm achieves the
            highest accuracy (least error rate) for k=1.

            However, in real-life, problems we wouldn't usually not have complete data or a solution.
            In such scenarios, we need to choose k appropriate to the partially available data. For this,
            consult problem 1.4.



            House ownership - data rescaling

            For each person, we are given their age, yearly income, and whether their is a house or not:

             Age Annual income in USD House ownership status

             23   50,000                  Non-owner
             37   34,000                  Non-owner

             48   40,000                  Owner
             52   30,000                  Non-owner
             28   95,000                  Owner

             25   78,000                  Non-owner
             35   130,000                 Owner

             32   105,000                 Owner
             20   100,000                 Non-owner
             40   60,000                  Owner

             50   80,000                  Peter





                                                     [ 18 ]
   25   26   27   28   29   30   31   32   33   34   35