Page 33 - Data Science Algorithms in a Week
P. 33
Classification Using K Nearest Neighbors
73 77 Informatics
90 63 Informatics
20 0 Mathematics
33 0 Mathematics
105 10 Mathematics
2 0 Mathematics
84 2 Mathematics
12 0 Mathematics
41 42 ?
The documents with a high rate of the words algorithm and computer are in the class of
informatics. The class of mathematics happens to contain documents with a high count
of the word algorithm in some cases; for example, a document concerned with the
Euclidean algorithm from the field of number theory. But, since mathematics tends to be
less applied than informatics in the area of algorithms, the word computer is contained
in such documents with a lower frequency.
We would like to classify a document that has 41 instances of the word algorithm per 1,000
words and 42 instances of the word computer per 1,000 words:
[ 21 ]