Page 201 - Data Science Algorithms in a Week
P. 201
11
Glossary of Algorithms and
Methods in Data Science
k-Nearest Neighbors algorithm: An algorithm that estimates an unknown data
item to be like the majority of the k-closest neighbors to that item.
Naive Bayes classifier: A way to classify a data item using Bayes' theorem about
the conditional probabilities, P(A|B)=(P(B|A) * P(A))/P(B), and in addition,
assuming the independence between the given variables in the data.
Decision Tree: A model classifying a data item into one of the classes at the leaf
node, based on the matching properties between the branches on the tree and the
actual data item.
Random Decision Tree: A decision tree in which every branch is formed using
only a random subset of the available variables during its construction.
Random Forest: An ensemble of random decision trees constructed on the
random subset of the data with the replacement, where a data item is classified to
the class with the majority vote from its trees.
K-means algorithm: The clustering algorithm that divides the dataset into the k
groups such that the members in the group are as similar possible, that is, closest
to each other.
Regression analysis: A method of the estimation of the unknown parameters in a
functional model predicting the output variable from the input variables, for
example, to estimate a and b in the linear model y=a*x+b.

