Page 201 - Data Science Algorithms in a Week
P. 201

11









                          Glossary of Algorithms and



                               Methods in Data Science






                      k-Nearest Neighbors algorithm: An algorithm that estimates an unknown data
                      item to be like the majority of the k-closest neighbors to that item.
                      Naive Bayes classifier: A way to classify a data item using Bayes' theorem about
                      the conditional probabilities, P(A|B)=(P(B|A) * P(A))/P(B), and in addition,
                      assuming the independence between the given variables in the data.
                      Decision Tree: A model classifying a data item into one of the classes at the leaf
                      node, based on the matching properties between the branches on the tree and the
                      actual data item.
                      Random Decision Tree: A decision tree in which every branch is formed using
                      only a random subset of the available variables during its construction.
                      Random Forest: An ensemble of random decision trees constructed on the
                      random subset of the data with the replacement, where a data item is classified to
                      the class with the majority vote from its trees.
                      K-means algorithm: The clustering algorithm that divides the dataset into the k
                      groups such that the members in the group are as similar possible, that is, closest
                      to each other.
                      Regression analysis: A method of the estimation of the unknown parameters in a
                      functional model predicting the output variable from the input variables, for
                      example, to estimate a and b in the linear model y=a*x+b.
   196   197   198   199   200   201   202   203   204   205