Page 32 - Data Science Algorithms in a Week
P. 32

Classification Using K Nearest Neighbors


            After scaling, we get the following data:

             Age Scaled age Annual income in USD       Scaled annual      House ownership status
                                                       income
             23   0.09375     50,000                   0.2                Non-owner
             37   0.53125     34,000                   0.04               Non-owner

             48   0.875       40,000                   0.1                Owner
             52   1           30,000                   0                  Non-owner

             28   0.25        95,000                   0.65               Owner
             25   0.15625     78,000                   0.48               Non-owner
             35   0.46875     130,000                  1                  Owner

             32   0.375       105,000                  0.75               Owner
             20   0           100,000                  0.7                Non-owner
             40   0.625       60,000                   0.3                Owner

             50   0.9375      80,000                   0.5                ?
            Now, if we apply the 1-NN algorithm with the Euclidean metric, we will find out that Peter
            more than likely owns a house. Note that, without rescaling, the algorithm would yield a
            different result. Refer to exercise 1.5.



            Text classification - using non-Euclidean


            distances

            We are given the word counts of the keywords algorithm and computer for documents of
            the classes, informatics and mathematics:

             Algorithm words per 1,000 Computer words per 1,000 Subject classification

             153                        150                       Informatics
             105                        97                        Informatics
             75                         125                       Informatics

             81                         84                        Informatics



                                                     [ 20 ]
   27   28   29   30   31   32   33   34   35   36   37