Page 33 - Data Science Algorithms in a Week
P. 33

Classification Using K Nearest Neighbors


             73                         77                        Informatics

             90                         63                        Informatics
             20                         0                         Mathematics
             33                         0                         Mathematics

             105                        10                        Mathematics
             2                          0                         Mathematics

             84                         2                         Mathematics
             12                         0                         Mathematics
             41                         42                        ?

            The documents with a high rate of the words algorithm and computer are in the class of
            informatics. The class of mathematics happens to contain documents with a high count
            of the word algorithm in some cases; for example, a document concerned with the
            Euclidean algorithm from the field of number theory. But, since mathematics tends to be
            less applied than informatics in the area of algorithms, the word computer is contained
            in such documents with a lower frequency.

            We would like to classify a document that has 41 instances of the word algorithm per 1,000
            words and 42 instances of the word computer per 1,000 words:































                                                     [ 21 ]
   28   29   30   31   32   33   34   35   36   37   38